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Abstract 

In this paper we continue the investigation of the spectral theory and exponential 
asymptotics of primarily discrete-time Markov processes, following Kontoyiannis and Meyn 
[32]. We introduce a new family of nonlinear Lyapunov drift criteria, which characterize 
distinct subclasses of geometrically ergodic Markov processes in terms of simple inequalities 
for the nonlinear generator. We concentrate primarily on the class of multiplicatively regular 
Markov processes, which are characterized via simple conditions similar to (but weaker 
than) those of Donsker-Varadhan. For any such process <I> = {$(t)} with transition kernel 
P on a general state space X, the following are obtained. 

Spectral Theory: For a large class of (possibly unbounded) functionals F : X — > C, 
the kernel P(x,dy) — e F ^P(x,dy) has a discrete spectrum in an appropriately 
defined Banach space. It follows that there exists a "maximal" solution (A, /) to the 
multiplicative Poisson equation, defined as the eigenvalue problem Pf = A/. The 
functional A(F) = log(A) is convex, smooth, and its convex dual A* is convex, with 
compact sublevel sets. 

Multiplicative Mean Ergodic Theorem: Consider the partial sums {St} of the process with 
respect to any one of the functionals F(&(t)) considered above. The normalized mean 
E K [exp(S' t )] (and not the logarithm of the mean) converges to f(x) exponentially fast, 
where / is the above solution of the multiplicative Poisson equation. 

Multiplicative regularity: The Lyapunov drift criterion under which our results are derived 
is equivalent to the existence of regeneration times with finite exponential moments 
for the partial sums {St}, with respect to any functional F in the above class. 

Large Deviations: The sequence of empirical measures of {<&(£)} satisfies a large deviations 
principle in the "r w ° -topology," a topology finer that the usual r-topology, generated 
by the above class of functionals F on X which is strictly larger than L oc (X). The rate 
function of this LDP is A*, and it is shown to coincide with the Donsker-Varadhan 
rate function in terms of relative entropy. 
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Exact Large Deviations Asymptotics: The above partial sums {S t } are shown to satisfy an 
exact large deviations expansion, analogous to that obtained by Bahadur and Ranga 
Rao for independent random variables. 

Keywords: Markov process, large deviations, entropy, stochastic Lyapunov function, 
empirical measures, nonlinear generator, large deviations principle. 
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1 Introduction and Main Results 



Let 3? = : t £ T} be a Markov processes taking values in a Polish state space X, equipped 

with its associated Borel u-field B. The time index T may be discrete, T = Z+, or continuous 
T = R+, but we specialize to the discrete-parameter case after Section 1.1. 

The distribution of 3> is determined by its initial state <E>(0) = x £ X, and the transition 
semigroup {P l : t £ T}, where in discrete time all kernels P* are powers of the 1-step transition 
kernel P. Throughout the paper we assume that is ip -irreducible and aperiodic. This means 
that there is a c-finite measure ip on (X, 0) such that, for any A £ B satisfying ip(A) > and 
any initial condition x, 

P l (x,A) > 0, for all t sufficiently large. 

Moreover, we assume that ip is maximal in the sense that any other such ip' is absolutely 
continuous with respect to ip (written ip' -< ip). 

For a -(/'-irreducible Markov process it is known that ergodicity is equivalent to the existence 
of a solution to the Lyapunov drift criterion (V3) below [34, 17]. Let V : X — > (0, oo] be an 
extended-real valued function, with V{x$) < oo for at least one xq £ X, and write A for the 
(extended) generator of the semigroup {P l : t £ T}. This is equal to A = (P — I) in discrete 
time (where I = I(x, dy) denotes the identity kernel 5 x (dy)), and in continuous-time we think 
of A as a generalization of the classical differential generator A = ^.P t \t=o- 

Recall that a function s : X — > R + and a probability measure v on (X, B) are called small 
if for some measure m on Z with finite mean we have 

A) m(t) > s(x)u(A), x e X, A £ B. 

t>o 

A set C is called small if s = elc is a small function for some e > 0. Also recall that an 
arbitrary kernel P = P(x,dy) acts linearly on functions / : X —>■ C and measures v on (X,^), 
via 

Pf(-)= [ P(; dy)f(y) and vP ( ■ ) = / ^(dx)P(x, • ), respectively. (1) 

We say that the Lyapunov drift condition (V3) holds with respect to the Lyapunov function 
V [34], if: 

For a function W : X — > [l,oo), a small set C C X, and constants <5 > 0, 6 < oo, 
„4F < + 6Io , on S v := {x : V(x) < oo}. 

Condition (V3) implies that the set Sy is absorbing (and hence full), so that V(x) < oo a.e. 
[ip]; see [34, Proposition 4.2.3]. 

As in [34, 32], a central role in our development will be played by weighted L^ spaces: For 
any function W: X — > (0, oo], define the Banach space of complex- valued functions, 

LZ:={ 9 :X^Cs.t. supMgi<oo}, (2) 
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with associated norm \\g\\w := SVL P X \9i x )\/W(x). We write B + for the set of functions s : 
X — > [0, oo] satisfying ^(s) := f s(x) tp(dx) > 0, and, with a slight abuse of notation, we write 
A G 6 + if A G B and Y>(A) > (i.e., the indicator function I A is in B + ). Also, we let 
denote the Banach space of signed and possibly complex- valued measures fi on (X, B) satisfying 

\\n\\w ■= su PFeL^ M(-F) < °°. 

The following consequences of (V3) may be found in [34, Theorem 14.0.1]. 

Theorem 1.1 (Ergodicity) Suppose that $ is a tp -irreducible and aperiodic discrete-time 
chain, and that condition (V3) is satisfied. Then the following properties hold: 

1. (IF-ergodicity) The process is positive recurrent with a unique invariant probability 
measure ir G M-Y and for all x G Sy, 



sup 



P\x,F)-n(F) 

T-l 

T 



-> 0, t — > 00, 

^ - tt(F):= / F(y)ir(dy), T - 00, a.s. [P x ] F G L ( 



00 ' 



t=o 

where P x denotes the conditional distribution of given $>(0) = x. 
2. (W-regularity) For any A G B + there exists c = c(A) < 00 such that 

TA-l 



<5^ 1 y(x)+c, xGX. 



t=0 

where E x is the expectation with respect to P x , and the hitting times ta are defined as, 

t a := inf{/ > 1 : *(t) £ A}, A G B. (3) 

5. (Fundamental Kernel) There exists a linear operator Z : — > i/te fundamental 

kernel, swc/t i/iai 

^ZF = -F + tt(F), F^LZ- 
That is, the function F := ZF solves the Poisson equation, AF = — F + tt(F) . 



1.1 Multiplicative Ergodic Theory 

The ergodic theory outlined in Theorem 1.1 is based upon consideration of the semigroup of 
linear operators {P 1 } acting on the Banach space L^. In particular, the ergodic behavior of 
the corresponding Markov process can be determined via the generator A of this semigroup. 
In this paper we show that the foundations of the multiplicative ergodic theory and of the large 
deviations behavior of 3> can be developed in analogy to the linear theory, by shifting attention 
from the semigroup of linear operators {-P*} to the family of nonlinear, convex operators {W*} 
defined, for appropriate G, by 

WG (x) := log(E x [e G (*W>]) , x G X , t G T . 
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Formally, we would like to define the 'generator' TL associated with {W } by letting TL 

A) 

dt 1 



(W — /) in discrete time and TL = jrW t \t=o in continuous time. Observing that W*G 



log(P*e G ), in discrete time we have 

TLG = (W - I)G = log(Pe G ) - G = \og{e~ G Pe G ), 
and in continuous time we can similarly calculate, 

HG = lim-[W*-/lG = lim - log( e - G P*e G ) = e" G ^e G , 

whenever all the above limits exist. Rather than assume differentiability, we use these expres- 
sions as motivation for the following rigorous definition of the nonlinear generator, 

' log(e~ G Pe G ) discrete time (T = Z+); 
H(G) = \ (4) 
e Ae continuous time (T = 

when e G is in the domain of the extended generator. In continuous time, this is Fleming's 
nonlinear generator; see [22] for a starting point, and [20, 21] for recent surveys. 

In this paper our main focus will be on the following 'multiplicative' analog of (V3), where 
the role of the generator is now played by the nonlinear generator TL. We say that the Lyapunov 
drift criterion (DV3) holds with respect to the Lyapunov function V : X — > (0, oo], if: 



For a function W: X — ► [1, oo), a small set C C X, and constants 5 > 0, b < oo, 

H{V) < -5W + bl c , onS v . 



(DV3) 



[This condition was introduced in [32], under the name (mV3).] Under either condition (V3) 
or (DV3), we let {CV(j)l denote the sublevel sets of W: 

C w (r) = {y : W(y) < r}, r G R. (5) 

The main assumption in many of our results below will be that $ satisfies (DV3), and also 
that the transition kernels satisfy a mild continuity condition: We require that they possess a 
density with respect to some reference measure, uniformly over all initial conditions x in the 
sublevel set Cw{r) of W. These assumptions are formalized in condition (DV3+) below. 



(i) The Markov process $ is ^-irreducible, aperiodic, and it satisfies 

condition (DV3) with some Lyapunov function V : X — > [1, oo); 

(ii) There exists Tq > such that, for each r < ||IU||oo ; there is a measure r } (DV3+) 

with p r {e v ) < oo and P X {<5>{T ) G A, r c ^ (r) > T } < r (A) for all 
x e C w {r), A e B. 

Condition (DV3+) captures the essential ingredients of the large deviations conditions 
imposed by Donsker and Varadhan in their pioneering work [14, 15, 16], and is in fact somewhat 
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weaker than those conditions. In Section 2 an extensive discussion of this assumption is 
given, its relation to several well-known conditions in the literature is described in detail. 
In particular, part (ii) of condition (DV3+) [to which we will often refer as the "density 
assumption" in (DV3+)] is generally the weaker of the two assumptions. 

In most of our results we assume that the function W in (DV3) is unbounded, ||I^||oo • — 
sup^, |W(x)| = oo. When this is the case, we let Wq : X — > [l,oo) be a fixed function in L^, 
whose growth at infinity is strictly slower than W in the sense that 



lim sup 



W (x) 



L W(x) 



{W(x)>r} 



0. 



(6) 



Below we collect, from various parts of the paper, the "multiplicative" ergodic results we 
derive from (DV3+), in analogy to the "linear" ergodic-theoretic results stated in Theorem 1.1. 

Theorem 1.2 (Multiplicative Ergodicity) Suppose that the discrete-time chain <1> satisfies 
condition (DV3+) with W unbounded, and let Wq G be as in (6). Then the following 
properties hold: 

1. (W-multiplicative ergodicity) The process is positive recurrent with a unique invariant 
probability measure ir satisfying, for some n > 0, 

7r(e" v ) < oo and 7r(e" w ) < oo. 

For any real-valued F G L^°, there exist F G L^, A(F) G C, and constants bo > 0, 
Bo < co, such that 



T-l 



exp (5>(*(t))]-A(F)] 



t=o 



< e vV(x)+B -b T _ 



(7) 



for allT>l, xeX. 



2. (W- multiplicative regularity) For any A G B + there exist constants rj = rj(A) > and 
c = c(A) < co, such that 

TA-l 

\og(E x exp (rj ^ ) < V(x) + c, x G X. 

t=o 



5. (Multiplicative Fundamental 'Kernel') T/iere existe a nonlinear operator Q: — > L^,, 
i/ie multiplicative fundamental kernel, suc/i i/tai the function F in (1.) can be expressed 
as F = G(F) for real-valued F G L^°, and F solves the multiplicative Poisson equation, 



H(F) = —F + A(F) 



(8) 



Proof. Assumption (DV3) combined with Theorem 2.2 implies that $ is geometrically 
ergodic (equivalently, Vo-uniformly ergodic) for some Lyapunov function Vo: X — ► [l,oo), 
hence the process is also positive recurrent. Moreover, v vo := e VoV G for some < rfo < 1. 
By the geometric ergodic theorem of [34] it follows that ir(v m ) < co. 
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Under (DV3), the stochastic process m = {m(t)} defined below is a super-martingale with 
respect to T t = a{<S>(s) : < s < t}, t>0, 



t-i 

m(i):=exp(u($(t))+^[(5W($(s))-Mc($(s))]), *>0. (9) 

s=0 



From the super-martingale property and Jensen's inequality we obtain the bound, 



E 



t-i 



x exp(j] V($(t))-ri b + Y J VoSW{<S>(s))^ < v m {x) 

s=0 



x £ X. 



which gives the desired bound in (1.), where n := 5t]q. The multiplicative ergodic limit (7) 
follows from Theorem 3.1 (iii). The existence of an inverse Q to Ti, is given in Proposition 3.6, 
which establishes the bound F 6 stated in (1.), as well as result (3.). 

Theorem 2.5 shows that (DV3) actually characterizes VF-multiplicative regularity, and 
provides the bound in (2.). □ 

As in [32], central to our development is the observation that the multiplicative Poisson 
equation (8) can be written as an eigenvalue problem. In discrete-time with A = A(F), (8) 
becomes (e F P)e F = e A e F , or, writing / = e F , f = e F and A = e A , we obtain the eigenvalue 
equation, 

Pff = A/, for the kernel Pf(x,dy) := f(x)P(x,dy). 

The assumptions of Theorem 1.2 are most easily illustrated in continuous time. Consider 
the following diffusion model on R, sometimes referred to as the Smoluchowski equation. For 
a given potential u : R — > R+, this is defined by the stochastic differential equation 

dX(t) = —u x (X(t)) dt + adW(t) , (10) 

where u x := ^n, and W = {W(t) : t > 0} is a standard Brownian motion. On C 2 , the 
extended generator A of X = {X(t) : t > 0} coincides with the differential generator given by, 

A =^ 2 ^- u 4x- (n) 

When a > this is an elliptic diffusion, so that the semigroup {P*} has a family of smooth, 
positive densities P t (x,dy) = p(x,y;t)dy, x,y G R [33]. Hence the Markov process X is 
■0-ir reducible, with ip equal to Lebesgue measure on R. 

A special case is the one-dimensional Ornstein-Uhlenbeck process, 

dX(t) = -6X(t) dt + adW{t) , (12) 

where the corresponding potential function is u(x) = ^5x 2 , x G R. 

Proposition 1.3 The Smoluchowski equation satisfies (DV3+) with V = 1 + ua~ 2 and W = 
1 + u 2 , provided the potential function u: R —> R + is C 2 and satisfies: 

(a) lim u(x) = co; 

\x\^oo 
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lim mf(u x (x)) 2 > 0. 

|rr|— »oo 



PROOF. Let V = 1 + ua 2 . We then have, 

H{V):=e- v Ae v = e~ v {-u x (e v a~ 2 u x ) + \a 2 (e v [u xx o- 2 + a^u 2 ]) } 

— 2 a: ~ 2 xx ' 

It is thus clear that the desired drift conditions hold. The proof is complete since P t (x,dy) 
possesses a continuous density p(x,y;t) for each t > 0: We may take To = 1, and for each r 
we take j3 r equal to a constant times Lebesgue measure on CV(r). □ 

Proposition 1.3 does not admit an exact generalization to discrete-time models. However, 
the discrete-time one-dimensional Ornstein-Uhlenbeck process, 

X(t + l)-X(t) = -6X(t) + W(t + l), f>0, I(0)eM, (13) 

does satisfy the conclusions of the proposition, again with V = 1 + e^x 2 for some e > 0, when 
5 > and W is an i.i.d. Gaussian process with positive variance. 

Notation. Often in the transition from ergodic results to their multiplicative counterparts we 
have to take exponentials of the corresponding quantities. In order to make this correspondence 
transparent we have tried throughout the paper to follow, as consistently as possible, the 
convention that the exponential version of a quantity is written as the corresponding lower 
case letter. For example, above we already had / = e F , f = e F and A = e A . 

1.2 Large Deviations 

From now on we restrict attention to the discrete-time case. 

Part 1 of Theorem 1.2 extends the multiplicative mean ergodic theorem of [32] to the larger 
class of (possibly unbounded) functionals F G L^°. In this section we assume that (DV3+) 
holds with an unbounded function W, and we let a function Wq 6 be chosen as in (6). 

For n > 1, let L n denote the empirical measures induced by $ on (X, B), 

^ n— 1 

L„ := - 5^ tf*(t) n>l, (14) 
n t=o 

and write (■,-) for the usual inner product; for jjl a measure and G a function, {(i,G) = 
n(G) := J G(y) fi(dy), whenever the integral exists. Then, from Theorem 3.1 it follows that 
for any real- valued F G and any a G R we have the following version of the multiplicative 
mean ergodic theorem, 

f a (x), n ^ oo, xeX, (15) 

where f a := e^^ aF ^ is the eigenfunction constructed in part 3 of Theorem 1.2, corresponding to 
the function aF. 



expf— nA(aF)J E x exp(an(L n , F)J 
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In Section 5, strong large deviations results for the sequence of empirical measures {L n } 
are derived from the multiplicative mean ergodic theorem in (15), using standard techniques 
[9, 7, 12]. First we show that, for any initial condition x G X, the sequence {L n } satisfies a large 
deviations principle (LDP) in the space Mi of all probability measures on (X, B) equipped 
with the t Wo -topology, that is, the topology generated by the system of neighborhoods 

N F (c, e) := {ueMi: \v{F) - c\ < e} , for real-valued F G L^°, c G K, e > . (16) 

Moreover, the rate function that governs this LDP is the same as the Donsker-Varadhan 
rate function, and can be characterized in terms of relative entropy, 

I{v) :=infiJ(i/©P||i/©P), 

where the infimum is over all transition kernels P for which v is an invariant measure, v P 
denotes the bivariate measure [v P](dx, dy) := u(dx)P(x, dy) on (X x X, B x B), and H( ■ \\ ■ ) 
denotes the relative entropy, 

( fdulog^-, when^ exists 
H(ji\\u) = \ J du dv (17) 

[ oo, otherwise. 

[Throughout the paper we follow the usual convention that the infimum of the empty set is 
+oo.] As we discuss in Section 2.6 and Section 5, the density assumption in (DV3+) (ii) is 
weaker than the continuity assumptions of Donsker and Varadhan, but it cannot be removed 
entirely. 

Further, the precise convergence in (15) leads to exact large deviations expansions analo- 
gous to those obtained by Bahadur and Ranga Rao [1] for independent random variables, and 
to the local expansions established in [32] for geometrically ergodic chains. For real-valued, 
non-lattice functionals F G L^°, in Theorem 5.3 we obtain the following: For c > vr(F) and 
x G X, 

n— 1 f ( \ 

pjj2 nm) > nc\ — M^ e -^(c) 5 n ^ (18) 

where a G R is chosen such that -^A(aF) = c, f a (x) is the eigenfunction appearing in the 
multiplicative mean ergodic theorem (15), a 2 a = -^A(aF), and the exponent J(c) is given in 
terms of as 

J(c) := inf{/(^) : v is a probability measure on (X, B) satisfying v{F) > c} . (19) 

A corresponding expansion is given for lattice functionals. 

These large deviations results extend the classical Donsker-Varadhan LDP [14, 15] in several 
directions: First, our conditions are weaker. Second, when (DV3+) holds with an unbounded 
function W, the t w ° -topology is finer and hence stronger than either the topology of weak 
convergence, or the r-topology, with respect to which the LDP for the empirical measures 
{L n } is usually established [24, 4, 13]. Third, apart from the LDP we also obtain precise large 
deviations expansions as in (18) for the partial sums with respect to (possibly unbounded) 
functionals F G L^°. 
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Following the Donsker-Varadhan papers, a large amount of work has been done in estab- 
lishing large deviations properties of Markov chains under a variety of different assumptions; 
see [12, 13] for detailed treatments. Under conditions similar to those in this paper, Ney and 
Nummelin have proved "pinned" large deviations principles in [37, 38]. In a different vein, 
under much weaker assumptions (essentially under irreducibility alone) de Acosta [10] and 
Jain [28] have proved general large deviations lower bounds, but these are, in general, not 
tight. 

One of the first places where the Feller continuity assumption of Donsker and Varadhan 
was relaxed is Bolthausen's work [4]. There, a very stringent condition on the chain is imposed, 
often referred to in the literature as Stroock's uniform condition (U). In Section 2.5 we argue 
that (U) is much more restrictive than the conditions we impose in this paper. In particular, 
condition (U) implies Doeblin recurrence as well as the density assumption in (DV3+) (ii). 

More recently, Eichelsbacher and Schmock [19] proved an LDP for the empirical measures of 
Markov chains, again under the uniform condition (U). This LDP is proved in a strict subset of 
Mi, and with respect to a topology finer than the usual r-topology and similar in spirit to the 
t w ° topology introduced here. In addition to (U), the results of [19] require strong integrability 
conditions that are a priori hard to verify: In the above notation, in [19] it is assumed that for 
at least one unbounded function Wq ■ X — > R, we have E x [exp{a|VFo( ( I ) (n))|}] < oo, uniformly 
over n > 1, for all real a > 0. This assumption is closely related to our condition (DV3), and, 
as we show in Section 3, (DV3) in particular provides a means for identifying a natural class 
of functions Wq satisfying this bound. 

2 Structural Assumptions 

There is a wide range of interrelated tools that have been used to establish large deviations 
properties for Markov processes and to develop parts of the corresponding multiplicative er- 
godic theory. Most of these tools rely on a functional-analytic setting within which spectral 
properties of the process are examined. A brief survey of these approaches is given in [32], 
where the main results relied on the geometric ergodicity of the process. In this section we 
show how the assumptions used in prior work may be expressed in terms of the drift criteria 
introduced here and describe the operator-theoretic setting upon which all our subsequent 
results will be based. 

2.1 Drift Conditions 

Recall that the (extended) generator A of is defined as follows: For a function g : X — ► C, 
we write Ag = h if for each initial condition <&(0) = x £ X the process l(t) := Xls=o h(^( s )) ~ 
g(Q(t)), t > 1, is a local martingale with respect to the natural filtration {jT t = cr($(s), < 
s <t) : t > 1}. In discrete time, the extended generator is simply A = P — I, and its domain 
contains all measurable functions on X. 

The following drift conditions are considered in [34] in discrete time, 



(V2) 
(V3) 
(V4) 



AV < -8 + blc 



AV < -5W + bI c 



AV < -5V + bI c 
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where in each case C is small, V : X — > (0, oo] is finite a.e. [tp], and b < oo, 5 > are constants. 
We further assume that W is bounded below by unity in (V3), and that V is bounded from 
below by unity in (V4). It is easy to see that (V2)-(V4) are stated in order of increasing 
strength: (V4) (V3) (V2). 

Analogous multiplicative versions of these drift criteria are defined as follows, 



(DV2) 


nv 


< 


-5 + bl c 


(DV3) 


nv 


< 


-5W + bl c 


(DV4) 


nv 


< 


-SV + bl c , 



where 7i is the nonlinear generator defined in (4). The following implications follow easily 
from the definitions: 

Proposition 2.1 For each k = 2,3,4, the drift condition (DVfc) implies (Vfc). 

Proof. We provide a proof only for k = 3 since all are similar. Under (DV3), Pe v < 
e V-W+bi c _ j ensen ' s inequality gives e PV < Pe v , and taking logarithms gives (V3). □ 

We find that Proposition 2.1 gives a poor bound in general. Theorem 2.2 shows that (DV2) 
actually implies (V4). Its proof is given in the Appendix, after the proof of Theorem 2.5. 

Theorem 2.2 ((DV2) =4> (V4)) Suppose 3> is ip -irreducible and aperiodic. //(DV2) holds for 
some V: X — ► (0, oo], then (V4) holds for some Vq which is equivalent to v v := e vV for some 
rj > 0, in the sense that, 

V G LS and v v G . 
2.2 Spectral Theory Without Reversibility 

The spectral theory described in this paper and in [32] is based on various operator semigroups 
{P n : n G Z+}, where each P n is the nth composition of a possibly non-positive kernel P. 
Examples are the transition kernel P; the multiplication kernel Ic(x,dy) = G{x)5 x (dy). for a 
given function G; the scaled kernel defined by 

P f (x,dy):=f(x)P(x,dy), (20) 

for any function F: X — > C with f = e F ; and also the twisted kernel, defined for a given 
function h: X — > (0, oo) by 

P,0r, A) := [I-1P4] (x, A) = ^p^^ xgX,AgB. (21) 

This is a probabilistic kernel (i.e., a positive kernel with P/j(x,X) = 1 for all x) provided 
Ph{x) < oo, x G X. It is a generalization of the twisted kernel considered in [32], where the 
function h was taken as h = f for a specially constructed /. It may also be regarded as a 
version of Doob's /i-transform [40]. 

The most common approach to spectral decompositions for probabilistic semigroups {P n } 
is to impose a reversibility condition [23, 5, 41]. The motivation for this assumption comes 
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from the L2 setting in which these problems are typically posed, and the well-known fact that 
the semigroup {P n } is then self-adjoint. We avoid a Hilbert space setting here and instead 
consider the weighted function spaces defined in (2); cf. [30, 31, 25, 35, 32]. 

The weighting function is determined by the particular drift condition satisfied by the 
process. In particular, under (DV3) it follows from the convexity of TL (see Proposition 4.4) 
that for any < i] < 1 we have the bound, 

H(r]V) < -SrjW + brjl c , on S v , (22) 

which may be equivalently expressed as Pv rj < e v ^ 5w+blc ^v. n , where v v := e vV . This bound 
implies that Pf : —> -^00 is a bounded linear operator for any function / satisfying \\F + \\ \y < 
nS (where F + := max(F, 0)), and any < rj < 1. 

Under any one of the above Lyapunov drift criteria, we will usually consider the function 
v defined in terms of the corresponding Lyapunov function V on X via v = e v . For any such 
function v. X — > [1, 00) and any linear operator P : — ► L^,, we denote the induced operator 
norm by, 

n^^ij^'-heL^ ||%/ 0}. (23) 

The spectrum S(P) C C of P is the set of z G C such that the inverse [Iz — does not 
exist as a bounded linear operator on L^. We let ^ = £,({P n }) denote the spectral radius of 
the semigroup {P n }, 

£({P n }):= f lim \\P n \\l /n - (24) 

In general, the quantities |||P|||„ and £ depend upon the particular weighting function v. If P 
is a positive operator, then £ is greater than or equal to the generalized principal eigenvalue, 
or g.p.e. (see e.g. [39]), and they are actually equal under suitable regularity assumptions (see 
[2, 32], and Proposition 2.8 below). 

As in [32], we say that P admits a spectral gap if there exists eo > such that the set 
S{P) n {z : \z\ > £ — eo} is finite and contains only poles of finite multiplicity; recall that 
zq G S(P) is a pole of (finite) multiplicity n if: 

(i) zq is isolated in S(P), i.e., for some ei > we have {z G S(P) : \z— zq\ < ei} = {^o}; 

(ii) The associated projection operator 

Q :=^~ [ [Iz-P^dz, (25) 

can be expressed as a finite linear combination of some {si} C L^, {z^} C M.\, 

n—l 
i,j=0 

where [s ® v](x,dy) := s(x)v{dy). 
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See [32, Sec. 4] for more details. Moreover, we say that P is v-uniform if it admits a spectral 
gap and also there exists a unique pole A 6 S(P) of multiplicity one, satisfying |A | = £({-P*})- 
Recall that a Markov process <& is called geometrically ergodic [32] or equivalently V- 
uniformly ergodic [34] if it is positive recurrent, and the semigroup converges in the induced 
operator norm, 

|||P n - 1 <g> 7r|||y — ► 0, n — »■ oo , 

where 1 denotes the constant function l(x) = 1. It is known that this is characterized by 
condition (V4). Under this assumption, in [32] we proved that <I> satisfies a "local" large 
deviations principle. In this paper under the stronger condition (DV3+) we show that these 
local results can be extended to a full large deviations principle. 

The following result, taken from [32, Proposition 4.6], says that geometric ergodicity is 
equivalent to the existence of a spectral gap: 

Theorem 2.3 (Spectral Gap & (V4)) Let by a ip '-irreducible and aperiodic Markov chain. 

(a) If is geometrically ergodic with Lyapunov function V , then its transition kernel P 
admits a spectral gap in and it is V-uniform. 

(b) Conversely, if P is Vo-uniform, then 3> is geometrically ergodic with respect to some 
Lyapunov function V € L^ . 

Next we want to investigate the corresponding relationship between condition (DV3) and 
when the kernel P has a discrete spectrum in L^. First we establish an analogous 'near 
equivalence' between assumption (DV3) and the notion of u-separability, and in Theorem 3.5 
we show that -u-separability implies the discrete spectrum property. 

For any v : X — > [l,oo], finite a.e. [tp], we say that the linear operator P: L"^ — > L"^ is 
v-separable if it can be approximated uniformly by kernels with finite-rank. That is, for each 
e > 0, there exists a finite-rank operator K e such that |||P — K e f v < e. Since the kernel K e has 
a finite-dimensional range space, we are assured of the existence of an integer n > 1, functions 
{s{ : 1 < % < n} C L£o, and probability measures {fj : 1 < i < n} C M\, such that K e may be 
expressed, 

n 

K € (x,dy)=Y,Si®Vi- (26) 
i=l 

Note that the eigenvalues of K e may be interpreted as a pseudo-spectrum; see [8]. 

The following equivalence, established in the Appendix, illustrates the intimate relationship 
between the essential ingredients of the Donsker-Varadhan conditions, and the associated 
spectral theory as developed in this paper. Note that in Theorem 2.4 the density assumption 
from part (ii) of (DV3+) has been replaced by the more natural and weaker statement that 
Ic w ( r )P T ° is v-separable for all r. 1 The fact that this is indeed weaker than the assumption in 
(DV3) (ii) follows from Lemma B.3 in the Appendix. Applications of Theorem 2.4 to diffusions 
on R n and refinements in this special case are developed in [26]. 

Theorem 2.4 (^-Separability & (DV3)) Let <f> be a ip -irreducible and aperiodic Markov chain 
and let Tq > arbitrary. The following are equivalent: 

lr The notation IaP for a set A £ B and a kernel P is used to denote the kernel Ia(x)P(x, dy). 
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(a) Condition (DV3) holds with V : X — > [1, oo); W unbounded; and I Cw ^P T ° is v-separable 
for all r, where v = e v . 

(b) The kernel P T ° is v-separable for some unbounded function t>o : X — > [1, oo). 

We say that a linear operator P : — > L^, has a discrete spectrum in if its spectrum 
S has the property that S (~) K is finite, and contains only poles of finite multiplicity, for any 
compact set K C C \ {0}. It is shown in Theorem 3.5 that the spectrum of P is discrete under 
the conditions of (b) above. 

Taking a different operator-theoretic approach, Deuschel and Stroock [13] prove large de- 
viations results for the empirical measures of stationary Markov chains under the condition of 
hypercontractivity (or hypermixing) . In particular, their conditions imply that for some To, 
the kernel P T °(x,dy) is a bounded linear operator from L 2 (7r) to £4(71"), with norm equal to 1. 



2.3 Multiplicative Regularity 

Recall the definition of the empirical measures in (14), and the hitting times {ta} defined in 
(3). The next set of results characterize the drift criterion (DV3) in terms of the following 
regularity assumptions: 

Regularity 

(i) A set C € B is called geometrically regular if for any A E B + there exists 
77 = n(A) > such that 

sup E x [exp (i]ta)] < 00. 

xec 

The Markov process 3> is called geometrically regular if there exists a 
geometrically regular set C, and rj > such that 

E x [exp (r/rc)] < 00, x G X. 

(ii) A set C 6 B is called H -multiplicatively regular (^T-m.-regular) if for any 
A G B + , there exists i] = n(A) > satisfying, 



supE^ ex.p(r]T A (L TA ,H}) 
x&C L 



< 00. 



The Markov process $ is H-m. -regular if there exists an H-m. -regular 
set C e B, and n > such that 



expire (L TC ,H)) 



< 00 , x £ X. 



In [34, Theorem 15.0.1] a precise equivalence is given between geometric regularity and 
the existence of a solution to the drift inequality (V4). The following analogous result shows 
that (DV3) characterizes multiplicative regularity. A proof of Theorem 2.5 is included in the 
Appendix. 
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Theorem 2.5 (Multiplicative Regularity 44> (DV3)) For any H : X — ► [l,oo) ; the following 
are equivalent: 

(i) $ is H-m. -regular; 

(ii) TTie c/n/f inequality (DV3) holds for some V : X — ► (0, oo) and wif/i .ff G L^. 

If either of these equivalent conditions hold, then for any A G £/iere exists e > 0, 1 > n > 0, 
and B < oo satisfying, 

E^exp (eT A (L TA ,#) +^(*(ta)))] < exp(r/F(x) + B), x G X, 

where V is the solution to (DV3) in (ii). 

In a similar vein, in [44] the following condition is imposed for a diffusion on X = W 1 : 

For any n > 1 £/iere exists -ftT n C X compact, such that for any 
compact set K C X ; 

supE x [e nr ^] < oo. (27) 

lex 

In [44, 42] it is shown that this condition is closely related to the existence of a solution to 
(DV3), where the function W is further assumed to have compact sublevel sets. Under these 
assumptions, and under continuity assumptions similar to those imposed in [43], it is possible 
to show that the operator P n is compact for all n > [42, Theorem 2.1], or [11, Lemma 3.4]. 

We show in Proposition 2.6 that the bound assumed in [44] always holds under (DV3+). 
We say that G : X — ► R + is coercive if the sublevel set {x : G(x) < n} is precompact for each 
n > 1. Coercive functions exist only when X is a-compact. 

Proposition 2.6 Let <& be a ip -irreducible and aperiodic Markov chain on X. Assume more- 
over that X = M n ; that condition (DV3+) holds with V : X — > [1, oo) continuous; W unbounded; 
and the kernels {I Cw ^P T ° : r > 1} are v-separable for some Tq > 1. Then, there exists a 
sequence of compact sets {K n : n > 1} satisfying (27). 

Proof. Lemma B.2 combined with Proposition C.7 implies that we may construct functions 
(Vi, W\) from X to [1, oo), and a constant b\ satisfying the following: sup{V(x) : x G Cw-i {f)} < 
oo for each r; W\, V\ G L^; W\ is coercive; and H(Vi) < V\ — W\ + b\. Lemma C.8 combined 
with continuity of V then implies that (27) also holds, with K r = closure ofCwi(n r ) for some 
sequence of positive integers {n r }. □ 

Proposition 2.6 has a partial converse: 

Proposition 2.7 Suppose the chain $ is ^-irreducible and aperiodic. Suppose moreover that 
X = IR n ; that the support of ip has non-empty interior; that P has the Feller property; and that 
there exists a sequence of compact sets {K n : n > 1} satisfying (27). Then Condition (DV3) 
holds with V,W: X — > [l,oo) continuous and coercive. 

Proof. Proposition A. 2 asserts that there exists a solution to the inequality H(V) < —^W + 
blc with (V 7 , W) continuous and coercive, C compact, and b < oo. Under the assumptions 
of the proposition, compact sets are small (combine Proposition 6.2.8 with Theorem 5.5.7 of 
[34]). We may conclude that C is small, and hence that (DV3) holds. □ 
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2.4 Perron-Frobenius Theory 

As in [32] we find strong connections between the theory developed in this paper, and the 
Perron-Frobenius theory of positive semigroups, as developed in [39]. 

Suppose that {P n : n G Z + } is a semigroup of positive operators. We assume that {P n } 
has finite spectral radius £ in L^,. Then, the resolvent kernel defined by R\ := [IX — P]^ 1 is 
a bounded linear operator on for each A > £. We assume moreover that the semigroup 
is V'-hreducible, that is, whenever A G B satisfies ip(A) > 0, then YlT=o P k { x i A) > 0, for all 
x G X. If * is a tp- irreducible Markov chain, then for any measurable function F : X — > R, the 
kernel P = Pf generates a ^-irreducible semigroup. In general, under ^-irreducibility of the 
semigroup, one may find many solutions to the minorization condition, 

oo 

R x (x, A) = \- k - 1 P k > s[x)v{A), x G X, AeB, (28) 

k=0 

with A > 0, s G B + , and u G that is, s: X — > M + is measurable with ^(s) > 0, and u is a 
positive measure on (X,jB) satisfying z/(X) > 0. The pair (s,^) is then called small, just as in 
the probabilistic setting. 

Theorem 3.2 of [39] states that there exists a constant A G (0, oo], the generalized principal 
eigenvalue, or g.p.e., such that, for any small function s G B + , 

oo ( = oo for all x G X, A < A 

^2x- k - 1 P k s(x) I (29) 
fc=o [ < oo for a.e. x G X [ip], A > A. 

The semigroup is said to be X-transient if for one, and then all small pairs (s,v), satisfying 
s G B + , v G .M + , we have XlfcLo X 1 uP k s < oo; otherwise it is called X-recurrent. 

Proposition 2.8 shows that the generalized principal eigenvalue coincides with the spectral 
radius when considering positive semigroups that admit a spectral gap. Related results may 
be found in Theorem 4.4 and Proposition 4.5 of [32]. 

Proposition 2.8 Suppose that {P n : n G Z + } is a ^-irreducible, positive semigroup. Suppose 
moreover that the semigroup admits a spectral gap in U^, with finite spectral radius £. Then: 

a) i = x. 

(ii) The semigroup is X-recurrent. 
(hi) P is v-uniform. 

(iv) For any X > £, and any (s,v) that solve (28) with s G B + , v G Ai + , the function 
h := [J7 — (R\ — s (g> h>)]~ l s, G is an eigenfunction. 

Proof. Suppose that either (i) or (ii) is false. In either case, for all small pairs (s,v), 

00 

limuR x s = y~] i~ k - 1 uP k s < 00. 
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It then follows that the projection operator Q defined in (25) satisfies vQs = for all small 
s G L^, v G M\. This is only possible if Q = 0, which is impossible under our assumption 
that the semigroup admits a spectral gap. 

To complete the proof, observe that the semigroup generated by the kernel R\ also admits 
a spectral gap, with spectral radius 7 = (A — £) _1 . It follows that there is a closed ball D C C 
containing 7 such that the two kernels below are bounded linear operators on for each 

7 eZA{7}, 

X,, = [J 7 - Rx}- 1 , y 7 = [7 7 - (R x - s ® z.)]- 1 . 

From (i) and (ii) we know that R\ is 7-recurrent, which implies that vYyS = 1, and that 
P/i = £h (see [39, Theorem 5.1]). Moreover, again from (i), (ii), since vY^s < 00 it follows 
that the spectral radius of (R\ — s0^) is strictly less than 7, which implies (iii). Finally, since 
IIIY^I^ < 00 we may conclude that h G L^, and this establishes (iv). □ 

On specializing to the kernels {Pf : F G L^ } we obtain the following corollary. Define for 
any measurable function F: X — ► (—00,00]: 

(i) A(F) = log(A(F)) = the logarithm of the g.p.e. for Pf. 

(30) 

(ii) ^{F) = log(^(F)) = the logarithm of the spectral radius of Pf. 

Lemma 2.9 Consider a ip -irreducible Markov chain, and a measurable function G: X — > R + . 
IfE(G) < 00 tften GeL^. 

Proof. We have IIP^I^ < 00 for some n > 1 when S(G) < 00. Consequently, since G and V 
are assumed positive, we have g(x) < P™v (x) < |P™|| w v(a;), for all x G X. □ 

Proposition 2.10 Under (DV3+) i/ie functional S is finite-valued and convex on L^°, and 
may be identified as the logarithm of the generalized principal eigenvalue: 

~(F) = A(F), FeL™°. 

Proof. Theorem 2.4 implies that Pf is -u-separable, and Proposition 2.8 then gives the desired 
equivalence. Convexity is established in Lemma C.l. □ 

The spectral radius of the twisted kernel given in (21) also has a simple representation, 
when the function h is chosen as a solution to the multiplicative Poisson equation: 

Proposition 2.11 Assume that the Markov chain $ satisfies condition (DV3+) with W un- 
bounded. For real-valued F G L^°, the twisted kernel Pj satisfies (DV3+) with Lyapunov 
function V := V — F + c for c > sufficiently large. Consequently, the semigroup gener- 
ated by the twisted kernel has a discrete spectrum in L^, and its log-spectral radius has the 
representation, 

E(G) = E(F + G), GeC 
Proof. The kernels Pf and Pf are related by a scaling and a similarity transformation, 

Pf = Ai/) : /, '/<;/,. 
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It follows that (DV3+) (i) is satisfied with the Lyapunov function V, and we have V > 1 
for sufficiently large c since / G L"^. The representation of S also follows from the above 
relationship between Pj and Pf. 

The density condition (DV3+) (ii) follows similarly. Letting b r = ||A(/) -1 /Icv(r)llooj we 
have, under the transition law Pjs, 

P x {*(3b) G A, t Cw(t) > T } < f-\x)bJ°Pr(A), A e B, x e C w {r), 

where r (dx) = j3 r {dx)f{x). To establish (DV3+) (ii) it remains to show that f^ 1 is bounded 
on CV(r). 

Since the set Cw(?) is small for the semigroup {Pj : t > 0}, there exists e > 0, T\ < oo, 
and a probability distribution v such that 

Pffady) > ev{dy), x G C w (r), y G X. 

It follows that 

A(jr Tl / = Pff > ev(f), x e C w {r). 
Consequently, f^ 1 is bounded on Cu/(r). □ 

2.5 Doeblin and Uniform Conditions 

The uniform upper bound in condition (DV3+) (ii) is easily verified in many models. Consider 
first the special case of a discrete time chain $ with a countable state space X, and with W 
such that Cw(r) is finite for all r < H^Hoo- I n this case we may take To = 1 in (DV3+) (ii), 
and set 

xeCw( r ) 

This is the starting point for the bounds obtained in [2]. 

A common assumption for general state space models is the following: 

Condition (U) There exist 1 < T\ < T2 and a constant bo > 1, such that 

1 Ta 

P T Hx,A)<b -y2P t (y,A), x,y £X, A <E B. (31) 

See [13, 12], as well as [43, 27, 29]. It is obvious that (31) implies the validity of the upper 
bound in our assumption (DV3+) (ii). Somewhat surprisingly, Condition (U) also implies a 
corresponding lower bound, and moreover we may take the bounding measure equal to the 
invariant measure ir: 

Proposition 2.12 Suppose that $ is an aperiodic, ifj -irreducible chain. Then, condition (U) 
holds if and only if there is a probability measure tt on (X,B), a constant Nq > 1, and a 
sequence of non-negative numbers {S n : n > iVo}, satisfying, 

\P n (x,A) -tt(A)\ < S n Tr(A), A g B, x e X, n > N ; 

(32) 

hmn^oo S n = 0. 
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Proof. It is enough to show that condition (U) implies the sequence of bounds given in (32). 
Condition (U) implies the following minorization, 



Y t P t (y,A)>eu(A), AgB, V £X, 



t=i 



where e = T^ftg 1 , and v(A) = P Ti (xq,A), A G £>, with xq € X arbitrary. Since the chain is 
assumed aperiodic and ^-irreducible, it follows that the chain is uniformly ergodic, a property 
somewhat stronger than Doeblin's condition [34, Theorem 16.2.2]. Consequently, there exists 
an invariant probability measure n, and constants B < oo, bo > such that, 

|||P n - 1 <8> Trfl! < e - bon+B(> , n e T. (33) 

Condition (U) then gives the following upper bound: On multiplying (31) by vr(dy), and 
integrating over y £ X, we obtain, 

P Tl (x, A) < b Ti(A) , x € X, A e B. 

Let r denote the bivariate measure given by, T(dx,dy) = 7r(dx)P Tl (x,dy), for x,y G X. The 
previous bound implies that T has a density p(x, y; T\) with respect to ir x 7r, where p( • , ■ ; Ti) is 
jointly measurable, and may be chosen so that it satisfies the strict upper bound, p(x, y; Ti) < 
6o, for x, y £ X. The probability measure T has common one-dimensional marginals (equal to 
7r). Consequently, we must have J p(x,y;Ti)ir(dx) = 1 a.e. y £ X [V|. 
For n > 2Ti we define the density p(x, y; n) via, 

p(x,y;n):= J P n ~ Tl (x,dz)p(z,y;T 1 ), x,y£X. 

We have the upper bound sup x y p(x,y;n) < bo for all n > T\ since P k is an Loo-contraction 
for any k > 0. Combining this bound with (33) gives the strict bound, 

\p{x,y;n)-l\ = \jP n - T ^x,dz){p(z,y;T 1 )-l) 

= J P n ~ T i (x, dz)p(z, y; Ti) — J ir(dz)p(z, y; T x 
< bo\\P n - Tl - Trllli < 6 e B °" feo(n " Tl) , n>Ti,x,y€X. 

This easily implies the result. □ 

Note that, for the special case of reflected Brownian motion on a compact domain, a similar 
result is established in [3]. 

We have already noted in the above proof that the lower bound in (32) implies the Doeblin 
condition, which is known to be equivalent to (V4) with V bounded for a -^-irreducible chain 
[34, Theorem 16.2.2]. Consequently, condition (U) frequently holds for models on compact 
state spaces but it rarely holds for models on R n . We summarize this and related correspon- 
dences with drift criteria here. 
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Proposition 2.13 Suppose that is an aperiodic, ^-irreducible chain. 

(i) // satisfies Doeblin's condition, then (DV4) holds with respect to the Lyapunov 
function V = 1. 

(ii) If & satisfies condition (U) and Vo: X — ► [l,oo) is given with \\P\\ Vo < oo, then 
(DV4) holds for a function V:X^ [l,oo) that is equivalent to Vo- And, trivially, 
part (ii) of condition (DV3+) also holds. 

Proof. Result (i) is a consequence of [34, Theorems 16.2.3 and 16.2.3] which state that the 
state space X is small under these assumptions, and hence (DV4) holds with V = 1. 
To prove (ii) we define, 



V(x) := 1 + log(E x exp(e rY ($(i)) 



Ti-l 



x G X, 



i=0 



where r > 1 is arbitrary, and e > is to be determined. The functions V and Vo are equivalent 
when e < j' 1 ~ 1 r _Tl+1 since then by Holder's inequality, 



V(x) < 1 + -L ^ log(E a; [exp(T 1 er l Vb( < I>(i)))]), x G X, 
1 i=0 

and the right hand side is in since |||-P l ||| Wo < oo for i > under the assumptions of (ii). 
Moreover, we have V > eVq by considering only the first term in the definition of V . Hence 
V G and Vo G L^,, which shows that V and Vq are equivalent. We assume henceforth that 
this bound holds on e. 

Holder's inequality also gives the bound, 



Pe v = E, 



< E s 



ex P (eEr=o^oW + l)) 
exp^pr-^Drri^o^C*))) 



i/p 



exp(gr T i- 1 eV ($(T 1 ))) 



1/9 



where we set p = r > 1 and q = r(r — 1) 1 > 1. Under Condition (U) we have HP^e^ ||oo < oo. 
Consequently, provided e > is chosen so that qr Tl ~ l e < 1 we then have, for some constant 
h, 

H{V) := log(Pe y ) - V < —(1 - r _1 )V + 6 X . 



This implies the result since the state space is small. 



□ 



2.6 Donsker-Varadhan Theory 

In Donsker and Varadhan's classic papers [14, 15, 16] there are two distinct sets of assumptions 
that are imposed for ensuring the existence of a large deviations principle, roughly correspond- 
ing to parts (i) and (ii) of our condition (DV3+). 
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Lyapunov criteria. The Lyapunov function criterion of [16, 43] is essentially equivalent 
to (DV3), with the additional constraint that the function W has compact sublevel sets; see 
conditions (l)-(5) on [43, p. 34]. In the general case (when X is not compact) this implies that 
(DV3) holds with an unbounded W. 

It is worth noting that the nonlinear generator is implicitly already present in the Donsker- 
Varadhan work, visible both in the form of the rate function, and in the assumptions imposed 
in [15, 16, 43]. 

Continuity and density assumptions. In [43] two additional conditions are imposed on 
3>. It is assumed that the chain satisfies a strong version of the Feller property, and that for 
each x, P(x,dy) has a continuous density p x (y) with respect to some reference measure a{dy) 
which is independent of x. 

These rather strong assumptions are easily seen to imply condition (DV3+) (ii) when W 
is coercive, so that the sets CV(r) are pre-compact. 

3 Multiplicative Ergodic Theory 
3.1 Multiplicative Mean Ergodic Theorems 

The main results of this section are summarized in the following two theorems. In particular, 
the multiplicative mean ergodic theorem given in (35) will play a central role in the proofs 
of the large deviations limit theorems in Section 5. For all these results we will assume that 
3> satisfies (DV3) with an unbounded function W . As above, we let B + denote the set of 
functions h: X -> [0, oo] with ip(h) > 0; for A G B we write A G B + if ip(A) > 0; and let M + 
denote the set of positive measures on B satisfying fi(X) > 0. 

As in (6) in the Introduction, we choose an arbitrary measurable function Wq : X — > [1, oo) 
in L^, whose growth at infinity is strictly slower than W. This may be expressed in terms of 
the weighted norm via, 

lim\\W I Cw{r) 4 w = 0, (34) 

where {Cw(r)} are the sublevel sets of W defined in (5). The function Wo is fixed throughout 
this section. 

Given F G and an arbitrary a G C, we recall from [32] the notation P a := e aF P, and 

S a := S(P a ) : = spectrum of P a in , 

where v := e v and V is the Lyapunov function in (DV3+). 

Next, we collect the main results of this section in the following theorem. Recall the 
definition of the empirical measures {L n } from (14). 

Theorem 3.1 (Multiplicative Mean Ergodic Theorem) Assume that the Markov chain $ sat- 
isfies condition (DV3+) with an unbounded W. For any m > 0, M > there exist a > m,U > 
such that for any real-valued F G with \\F\\yy < M , and any a in the compact set 

Q, = tt(a,Lu) := {q = a + iuj G C : |a| < a, and \uj\ < ZJ} , 

we have: 
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(i) There is a maximal, isolated eigenvalue X(aF) G S a satisfying \X(aF)\ = £(aF). 
Furthermore, A(aF) :=log(A(aF)) is analytic as a function ofa£Q, and for real 
a it coincides with the log-generalized principal eigenvalue of Section 2.4- 

(ii) Corresponding to each eigenvalue X(aF), there is an eigenfunction f a G and 
an eigenmeasure fi a G M\, where v:=e v , normalized so that fi a {f a ) = A a (X) = 1. 
The function f a solves the multiplicative Poisson equation, 

Pj a = X(aF)f a , 

and the measure fi a is a corresponding eigenmeasure: jl a P a = X(aF)p, a . 

(iii) There exist constants bo > 0, Bo < oo, independent of a, such that for all x G X, 
a G Q, n > 1, 



E,: 



exp(n[a(L n ,F) - A(aF)]J - / a (x)| < \a\v(x)e B °~ b ° n . (35) 



Proof. Lemma B.3 in the Appendix shows that (Pf a ) 2To+2 is ^-separable for any F <E L^°, 
and Theorem 3.5 then implies that the spectrum of Pf is discrete. It follows that solutions 
to the eigenvalue problem for Pf exist with fo G L^, jlo G M"-^ ■ The eigenvalue satisfies 
|^(^b)| = i{Fo) < oo. Smoothness of A is established in Proposition 4.3. 

Theorem 3.4 establishes the limit (iii) for a G C in a neighborhood of the origin. 

Consider then the twisted kernel P = Pj a , where a is real. Proposition 2.11 states that this 
satisfies (DV3+) with Lyapunov function V := V/f a . An application of Theorem 3.4 to this 
kernel then implies a uniform bound of the form (iii) for a in a neighborhood of a. For any 
given a > we may appeal to compactness of the line-segment {a G M : \a\ < a} to construct 
uJ > such that (35) holds for a G 0. □ 

We note that this result has many immediate extensions. In particular, if condition (DV3+) 
is satisfied, then this condition also holds with (V, W) replaced by (1 — n + i]V,W) for any 
< n < 1. Consequently, / G Loo for any < n < 1 when F G L^°. 

Part (iii) of the theorem is at the heart of the proof of all the large deviations properties 
we establish in Section 5. For example, from (35) we easily obtain that, for any F G L^°, the 
log-moment generating functions of the partial sums 

n-l 

S n = J2nHi))=n(L n ,F) 

i=0 

converge uniformly and exponentially fast: 

1 

n 



log E x [exp(an(L„, F})] — > A(aF), n ->■ oo. (36) 



We therefore think of A(aF) as the limiting log-moment generating function of the partial sums 
{S n } corresponding to the function F, and much of our effort in the following two section will 
be devoted to examining the regularity properties of A and its convex dual A*. 

Following [32], next we give a weaker multiplicative mean ergodic theorem for a in a 
neighborhood of the imaginary axis. Recall the following terminology: The asymptotic variance 
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<J 2 (F) of a function F : X —> R is denned to be variance obtained in the corresponding Central 
Limit Theorem for the partial sums of F($>(n)), assuming it exists. For a ^-uniformly ergodic 
(or, equivalently, a geometrically ergodic) chain, the asymptotic variance is finite for any 
function F satisfying F 2 G L^, and [34, Theoreml7.0.1] gives the representation, 



A function F : X — > R is called lattice if there are /i > and < d < h, such that 
[F(x) — d]/h is an integer for all x G X. The minimal h for which this holds is called the 
span of F. If the function F can be written as a sum, F = Fo + Fg, where Fg is lattice with 
span h and Fq has zero asymptotic variance then F is called almost-lattice (and /i is its span). 
Otherwise, F is called strongly non-lattice. The lattice condition is discussed in more detail 
in [32]. The proof of the following result follows from Theorem 3.1 and the arguments used in 
the proof of [32, Theorem 4.2]. 

Theorem 3.2 (Bounds Around the zw-Axis) Assume that the Markov chain 3> satisfies con- 
dition (DV3+) with an unbounded W , and that F G is real-valued. 

(NL) If F is strongly non-lattice, then for any m > and < ujq < uj\ < oo, there exist 
a > m, bo > 0, Bq < oo (possibly different than in Theorem 3.1), such that 



for all a = a + ito with \a\ < a and ujq < \uj\ < oj\, where v := e . 

(L) If F is almost-lattice with span h > 0, then for any m > and e > 0, there exist a > m, 
bo > 0, and B$ < oo (possibly different than above and in Theorem 3.1), such that (38) 
holds for all a = a + iuj with \a\ < a and e < \lo\ < 2n/h — e. 

3.2 Spectral Theory of f-Separable Operators 

The following continuity result allows perturbation analysis to establish a spectral gap under 
(DV3). Recall that we set v v := e vV ; for any real-valued F G we define / := e F ; and we 
let Pf denote the kernel Pf{x, dy) := f(x)P(x, dy). 

Lemma 3.3 Suppose that $ is ^-irreducible and aperiodic, and that condition (DV3) is sat- 
isfied. Then, for < rj < 1, n > 1, there exists b^^ < oo, such that for any F,G £ , 



a 2 (F) = lim nE w [((L n ,F) - vr(F)) 2 ] . 



(37) 




x G X, n > 1, 



(38) 




Proof. We have from the definition of the induced operator norm, 




= sup, 6X (|/(x)- 5 (x)|^f ) 



< sup^gx 1/0*0 - 9{x)\ exp(-7]5W{x) + rjb) . 
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Also, we have the elementary bounds, for all x £ X, 

\f(x) - g(x)\ = \e F ^ - e G ^\ < \F(x) - G(x)\e\ F W + \ G W\ 

< \\F-G\\w W (x)exp((\\F\\w + \\G\\w )W (x)) 

< \\F - G\\ Wo exp((l + \\F\\ Wo + \\G\\ Wo )W (x)) . 

Combining these bounds gives, 

\\Pf-Pg\L < ||F-G|| Wo sup(exp((l + ||F|| Wo + ||G|| Wo )Wb(x)-^W(a:)+»/6)). (39) 

The supremum is bounded under the assumptions of the proposition, which establishes the 
desired bound. 

We now show that, for any given h £ L^>, F £ L^°, the map G i— ► lQ_pPfh represents 
the Frechet derivative of We begin with the mean value theorem, 

P f h - P g h - I G ^ F P } h = (G - F)[P h h - Pfh] 

where Fg = OF + (1 — 9)G for some 9: X — ► (0, 1). The bounds leading up to (39) then lead 
to the following bound, for all x £ X, 

\[Pfh-P g h-I G _ F Pfh] (x)| 

< (||G - F|| Wo W (x)) (||F - G||w exp((l + ||F|| Wo + \\F e \\ Wo )W (x) - n5W(x) + t/6)). 

It follows that there exists b\ < oo such that 

\\[Pfh - P g h - I G -FPfh}\\ Vv < h\\F - Gf Wo G £ LZ°, \\F - G\\ Wo < 1 , 

which establishes Frechet differentiability. □ 

Next we present a local result, in the sense that it holds for all F with sufficiently small L^- 
norm, where the precise bound on H-FH^ is n °t explicit. Although a value can be computed as 
in [32], it is not of a very attractive form. Note that Theorem 3.4 does not require the density 
condition used in (DV3+). 

The definition of the empirical measures {L n } is given in (14). 

Theorem 3.4 (Local Multiplicative Mean Ergodic Theorem) Suppose that is ip -irreducible 
and aperiodic, and that condition (DV3) is satisfied. Then there exists eo > 0, < 770 < 1, 
such that for any complex-valued F £ satisfying \\F\\w < £0, and any < 77 < t]q: 

(i) There exist solutions X, f and (1 to the eigenvalue problems 

P f f = Xf, jiP f = \fi. (40) 

These solutions satisfy f £ L^2, A S M"i , A(X) = fi{f) = 1, and the eigenvalue 
X = X(F) £ C satisfies \X\ = £,({Pj})- Moreover, the solutions are uniformly 
continuous on this domain: For some b v < 00, 

\A(F) - A(G)| < b v \\F - G\\ w , \f ~ 9\v v < b v \\F - G\\ w , 

whenever F, G £ satisfy \\F\\w < ^O; ||G||w < e o- 
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(ii) There exist positive constants Bq and such that, for all g G L^2, x € X ; n > 1, 
we have 



E x [exp(n(L n , F) - nA(F))g(*(n))] -f{x)fi{g) 



< II nil p vV(x)+B -b n 

— lly||f J? e 



E x [exp(n(L n ,F)-nA(F))]-f(x)\ < \\F\\ w e^ v ^ +Bo ' b ° n (41) 

with f,p,,X(F) given as in (i). 
(iii) If V is bounded on the set C used in (DV3) then we may take % = 1- 

Proof. Assumption (DV3) combined with Theorem 2.2 implies that P is v^-uniform for all 
r] > sufficiently small (when V is bounded on C then (DV3) implies f-uniformity, so we may 
take 77 = 1). 

It follows that the inverse [I — P + 1 ® ir]" 1 exists as a bounded linear operator on 
[34, Theorem 16.0.1]. An application of Lemma 3.3 implies that the kernels Pf converge to P 
in norm 

l P ~ P flv v ~> °> bs\\F\\ w ^0, 0< V <1. 

Consequently, there exists t\ > such that [Iz — Pf + 1 (g> vr]" 1 is bounded for all z G C 
satisfying \z — 1| < e\, and all F G satisfying < e±. 

We have the explicit representation, writing A := [(z — 1)1 + Ii_fP], H := [I — P + 1 <g) ir], 

[Iz - P f + 1 ® vr]" 1 = [H + A]- 1 

= [I + H^A}- 1 ^ 1 . 

The first term on the right hand side exists as a power series in H~ 1 A, provided 

\\A\l v < (\\H-\y . (42) 

Moreover, in this case we obtain the bound, 

For any F 6 we have the upper bound, |F| < [|||F||| jy(5 _1 ]<W, where S > is given in 
(DV3). Recalling the definition of the log-generalized principal eigenvalue functional A from 
Section 2.4, and assuming that 6 := l-Fl^ -1 < 1, we may apply the convexity of A (see 
Lemma C.l) to obtain the upper bound, 

\A(F)\ < A(0SW) < 6A(5W) < Ob = \\F\\ w 5^ l b (44) 

where b is given in (DV3). 

From (44) we conclude that there is a constant eo > such that eo < \e\, and (42) together 
with the bound \\{F) - 1| < \e\ hold whenever \F\ W < e . For such F, it follows that (43) 
holds, and hence Pf is v^-uniform. Setting H := [IX(F) — Pf + 1 (g) it] we may express the 
eigenf unction and eigenmeasure explicitly as: 

f := c\H i. , ci := ( - - 

H := c 2 nH-\ o 2 :=(-^ I _). 
The remaining results follow as in [32, Theorem 4.1]. □ 
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In order to extend Theorem 3.4 to a non-local result we invoke the density condition 
in (DV3+) (ii). In fact, any such extension seems to require some sort of a density assumption. 
Recall that, in the notation of Section 2.2 and Section 2.4, we say that the spectrum S in 
of a linear operator P : — > is discrete, if for any compact set K C C \ {0}, S n K 
is finite and contains only poles of finite multiplicity. We saw earlier that condition (DV3+) 
implies that p 2T o+ 2 j g -^-separable. Next we show in turn that any u-separable linear operator 
P has a discrete spectrum in L^. 

Theorem 3.5 (u-Separability =>■ Discrete Spectrum) If the linear operator P : — > is 
bounded and P T ° : — > L^, is v-separable for some Tq > 1, then P has a discrete spectrum 
in L^. 

PROOF. Assume first that T = 1. For a given e > 0, set P = K + A with |||A||| V < e, and 
with K a finite-rank operator. Write K = Y^7=i s i® v ii an d for each z € C define the complex 
numbers {mij(z)} via 

iriij(z) = (vi, [Iz - A] _1 Sj), 1 < i, j < n. 

Let M{z) denote the corresponding nxn matrix, and set 7(2) = det(I — M(z)). The function 
7 is analytic on {\z\ > HAI^} because on this domain we have 

[Iz - A]" 1 = £ z~ n - 1 A", |[/z - A]- 1 !, <{\z\- \\A\l)- 1 < 00. 

Moreover, this function satisfies 7(2) — ► 1 as |z| — > 00, from which we may conclude that 
the equation 7(2;) = has at most a finite number of solutions in any compact subset of 

{M>|A|U. 

As argued in the proof of Theorem 3.4, if 7(2) / 0, then we have, 

[Iz-P]- 1 = [(Iz- A) - K}- 1 

= [Iz- A}- 1 ^ - K[Iz- A}- 1 }- 1 . 

Conversely, this inverse does not exist when j(z) = 0. Recalling that e > lAI^, we conclude 
that S(P) n {z : \z\ > e} = {z : 7(2) = 0}. The right hand side denotes a finite set, and e > 
is arbitrary. Consequently, it follows that the spectrum of P is discrete. 

If To > 1 then from the foregoing we may conclude that the spectrum of P T ° is discrete. 
The conclusion then follows from the identity 

[ Iz _ p] 1 = z-^- 1 (p fc [lz T » - P T °] zee. □ 

For each n > 1, we define the nonlinear operators A n and Q n the space of real- valued 
functions F G L^°, via, 

A n (P) := ilogE4exp(n(L„,P))] 

Q n (F) := logE x [exp(n[(L n ,F>-ACF)])] , FeL™?, xeX. 

The following result implies that both sequences of operators {Qn} and {A n } are convergent. 
Smoothness properties of the limiting nonlinear operators are established in Propositions 4.3 
and 4.5. 
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Proposition 3.6 Suppose that (DV3+) holds with an unbounded function W . Then there 
exists a nonlinear operator Q : — ► such that f = is a solution to the multiplicative 
Poisson equation for each F G . Moreover, for each Fq G and 5q > we have, 



sup \\g n (F) - g(F)\\ v - o, 

|| J F- J Fb||w <«b 

sup ||A n (F) - A(F)||y -► 0, n^oo. 
||^-Fo||iv <<5o 

Proof. Note that the second bound follows from the first. So, let 5q > and Fq G be 
given, and consider an arbitrary F G satisfying ||F — Fo||w ^ <fo- We define F n :=Q n (F) 
for n > 0, and F = (7(F) := log(/), with / given in Theorem 3.1. We show below that for any 
rj > 0, there exists b(rj) < oo such that for all such F, 

\F(x)\ < r)V(x)+b{ri), xGX; 

(45) 

|F n (x)| < rjV(x) + b(ri), x€X,n>l. 

Taking this for granted for the moment, observe that we then have, for any r > 1, n > 1, 

sup |||F n - F\I Cv(r) 4v < 2[ V + b^r- 1 } . 
\\F~F \\w <So 

Moreover, Theorem 3.1 implies that for any r > 1, 

sup |||F n — F|Ic? v ( r )||v — ► , exponentially fast as n — ► oo, 
ll^--Fb||w <<5o 

provided we have the uniform bound (45). Putting these two conclusions together, and letting 
r — > oo then gives, 

lim sup sup ||F n — F\\y < 2r/. 

n-K» \\F-F \\ Wq <5 

This then proves the desired uniform convergence, since rj > is arbitrary. 

We now prove the uniform bound (45). We begin with consideration of the functions 
{F : ||F — Follv^o < ^o}> since the corresponding bounds on {F„} then follow relatively easily. 

We know that / G L^, from Theorem 3.1. (If (DV3+) holds, then it also holds with V 
replaced by (1 — rj) + r]V for any < r) < 1.) This implies that F(x) < 7]V(x) + log ||/||^, for 
x G X. Hence it remains to obtain a lower bound. 

Let r = minjfc > 1 : |F(3>(fc))| < r}, with r > 1 chosen so that {x : \F(x)\ < r} G £> + . 
The stochastic process below is a positive local martingale, 

m(t) = exp(t{Lt, (F - A(F)))) /(*(*)) , t G Z+. 

The local martingale property combined with Fatou's Lemma then gives the bound, 



/» > E x [exp(r<L T ,(F-A(F))))/(*(r)) 



x G X, 
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and then by Jensen's inequality and the definition of r, 

F(x) > E x \HHt))+t{L t ,(F-A(F))) 
> - /' — E„ t(L t , \F + A(F)|) 



(46) 

x G X. 



The right hand side is bounded below by — ko(V + 1) for some finite ko by (V3) and [34, 
Theorem 14.0.1]. However, this bound can be improved. Since F G L^°, and since W G L^, 
with (Wo, W) satisfying (6), we can find, for any rjo > 0, a constant 6o(%) an d a small set 
satisfying 

\F + A(F)\ < b ( Vo )I Svo + mV. (47) 
Small sets are special (see [39]), which implies that 

supE x [t(L t ,S V0 )] < oo. (48) 

Moreover, it follows from [34, Theorem 14.0.1] that for some bo < oo, 

E x [r(L T ,V)] <boV(x), x G X. (49) 

Combining the bounds (46-49) establishes (45) for F. 

From (35) in Theorem 3.1 we have, for any rj > 0, constants B v < oo, ^ > such that, 
whenever \\F — Fq\\w < 1, 

< F(x) + log(l + exp(r]V(x) - F(x) + B v - b v n)) , n > 1. 

From the forgoing we see that the right hand side is bounded by 2r)V+b{2rj) for some b{2rj) < oo 
and all n. 

To complete the proof, we show that a corresponding lower bound holds: By definition of 
f n and an application of Jensen's inequality we have for all n > 0, 

f n {x)r\x) = E x [r\${n))] > (Est/Xn))])" 1 

where the expectation is with respect to the process with transition kernel Pj. On taking 
logarithms, and appealing to the mean ergodic limit for the twisted process, for constants 
B v < oo, b v > 0, 

F n (x) - F(x) > - log(E x [/(*(n))]) > - log(7r(/) + exp(7?y(x) + B v — nb v )) , n > 1. 
This together with the bounds obtained on F shows that (45) does hold. □ 



4 Entropy, Duality and Convexity 

In this section we consider structural properties of the operators Q, 7i and the functional A. 
As above, we assume throughout that $ satisfies (DV3+) with an unbounded function W, 
and we choose and fix an arbitrary function Wo G as in (34). Also, throughout this section 
we restrict attention to real-valued functions in and real-valued measures in since 
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one of our goals is to establish convexity and present Taylor series expansions of G, 7i, and 
A acting on L^°. Recall from Proposition 2.8 that the log-generalized principle eigenvalue A 
coincides with the log-spectral radius S on this domain. 

The convex dual of the functional A: — > R is defined for fx G MY via, 

A*( M ) := sup{<//, F) - A(F) : F G L^} . (50) 

A probability measure ji G .M^ and a function F G form a dwa/ pair if the above 

supremum is attained, so that A(F) + A*(/x) = (//, F). 

The main result of this section is a proof that A* can be expressed in terms of relative 
entropy (recall (17)) provided that we extend the definition to include bivariate measures on 
(X x X, B x B). Throughout this section we let M denote a generic function on X x X, and T a 
generic measure on (XxX,BxB). The definitions of and are extended as follows: 

M% 2 := fr:||r|| w :=/ [W(x) + W{y)\ \T(dx, dy)\ < oo) . (52) 
The following proposition shows that consideration of the bivariate chain 

*(*) = ( $ ^(t)^) ' * " °' * (0) G X X X ' (53) 

allows us to extend the domain of A to include bivariate functions, and then A* is defined on 
bivariate measures via 

A*(r):= sup «r,M)-A(M)), r G MYl (54) 

For any univariate measure fi and transition kernel P, we write [iQP for the bivariate measure 
/j,QP(dx, dy) := /j,(dx)P(x, dy). In particular, Proposition 4.1 shows that if satisfies (DV3+) 
with an unbounded W, then so does 

Proposition 4.1 The following implications hold for any Markov chain with corresponding 
bivariate chain : 

(i) If is -irreducible, then \1/ is i\) 2 -irreducible, with V>2 '■= ip © P/ 

(ii) // C is a small set for then X x C is small for 

(iii) If C £ B, fi, and T > 1 satisfy P T °(y,A) < fi(A) for y G C, A G t/ien on 
setting C 2 = X x C and /i2 = |i0i 5 ioe /lave, 

P 2 To+1 ((x,y),^ 2 ) < » 2 (A 2 ), (x,y) G C 2) A 2 G 6 x fi, 

where P 2 denotes the transition kernel for 

(iv) If v £ M + is small for $ i/ien v 2 := v P is smaZZ /or 



30 



(v) Suppose that <1> satisfies the drift condition (DV3). Then \l/ also satisfies the fol- 
lowing version of (DV3), 

H 2 {V 2 ) < -5W 2 + bl C2 , on S V2 , 

where TL2 is the nonlinear generator for , C 2 = X x C, and 

V 2 {x,y)=V{y) + \8W(x), W 2 (x,y) = \{W{x) + W{y)), x,yeX. 

Proof. To prove (i) consider any set A 2 G B x B with ip 2 (A 2 ) > 0. Define 

g(x)=l P(x, dy)I A2 (x, y) , x G X. 
Jyex 

Then we have ip{g) > 0, and hence by ^-irreducibility of 3>, YlkLoP k 9 ( x ) > 0, f° r an ^ £ X. 
It follows immediately that YlT=o ( x > v) > 0, for all x, y G X, from which we deduce that 

\l/ is ^2-hreducible. This proves (i), and (ii)-(iv) are similar. 
To see (v), observe that under (DV3), 

\ogP 2 e V2 (x,y) = log J P(y,dz)e? 5W W +v ^ 

< \5W(y) + [V(y) - 5W(y) + bl c (y)} 

= V 2 (x,y)-5W 2 (x,y) + bIc 2 {x,y), xeX,yeS v . □ 

We show in Theorem 4.2 that the convex dual may be expressed as relative entropy when 
T is a probability measure in M^2i 

A*(r)=H(r\\*QP)= f log(-^—(x,y))T(dx,dy), (55) 

JXxX \d[KQP\ J 

where n is the first marginal of T and n P denotes the bivariate measure [tt P](dx, dy) = 
fr(dx)P(x, dy). When A*(T) < 00, we show in Lemma 4.11 that the two marginals agree. 
Consequently, T may be expressed as, F(dx,dy) = [fr P](dx,dy) = 7r(dx)P(x,dy), where P 
is a transition kernel and n is an invariant measure for P. 

Theorem 4.2 (Identification of A* as Relative Entropy) Suppose that (DV3+) holds with an 
unbounded function W . Then: 

(i) For any probability measure T £ M^\ 2 , if A*(F) < 00 then the one- dimensional 
marginals {Fi,^} agree. Consequently, letting tt = T\ denote the first marginal 
of r we can write, for some transition kernel P, 

T(dx, dy) = 7r(dx)P(x, dy) , 

where tt is an invariant measure for the transition kernel P. 

(ii) // A* (r) < 00 for some probability measure T € MY$, then 

A*(T) = H(r\\7rQP):= [ log( f (x,y))T(dx,dy), (56) 
where [tt P] (dx, dy) := 7r(dx)P(x, dy) and n is the first marginal of T. 
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(iii) For any c > 0, the set {T GMf 2 °: A*(r) < c} is a bounded subset of M. 1 £ . 

Proof. Any probability measure T on (XxX,Bx6) can be decomposed as T(dx,dy) = 
ir(dx)P(x, dy), where tt is the first marginal for T. We show in Lemma 4.11 that the marginals 
of T must agree when A*(T) < oo, and this establishes (i). 

Finiteness of A*(T) also implies that T is absolutely continuous with respect to ir P. 
This follows from Proposition 4.6 (iv) below, applied to the bivariate chain Consequently, 
the transition kernel can be expressed, P(x,dy) = m(x,y)P(x,dy), for x,y £ X, for some 
measurable function m : X x X — > [0, oo]. 

With M = logm, Proposition C.10 gives the upper bound, 

A*(r) < (r, m) = # (rut p) . 

We apply Proposition C.4 to obtain a corresponding lower bound: There is a sequence {M k : 
k > 1} C Loo such that M k -> M point-wise, |M fc | < |M| for all fe > 1, and A(M k ) -> A(M), 
as fc — > oo. Moreover, we have A(M) = since dy) = m(x, y)P(x, dy) is transition kernel 
for a positive recurrent Markov chain, and hence '1-recurrent [39]. Consequently, 

A*(r) > <r,M fc -A(M fc )) (r,M), fc^oo. 

We thus obtain the identity A*(r) = (T,M), which is precisely (ii). 

Finally, part (iii) follows from Proposition 4.6 (iii) combined with Proposition 4.10. □ 



4.1 Convexity and Taylor Expansions 

We now return to consideration of the univariate chain and establish some regularity and 
smoothness properties for the (univariate) functional A and the nonlinear operators H and Q. 

We recall the definition of the twisted kernel j\ from (21), and for any h: X — ► (0, oo) we 
define the bilinear and quadratic forms, 

{{ F,G ))h := [P h {FG) - (P h F)(P h G)] 
Qh{F) := ((F,F)) h 

When h = 1 we remove the subscript so that ((F, G)) := P(FG) - (PF)(PG), and Q(F) := 
P(F 2 ) - (PF) 2 . It is well-known that a 2 (F) :=n(Q(ZF)) is equal to the asymptotic variance 
given in (37) [34, Theorem 17.5.3], where one version of the fundamental kernel Z : — > 
is given byZ=[/-P + l0 it]' 1 ; see [34, 32] for details. 

The fundamental kernels {Z^} for {Ph} and the quadratic forms {Qh} determine the 
second-order Taylor series expansions for A, Q and H. We begin with an examination of A. 

Proposition 4.3 Suppose that (DV3+) holds with an unbounded function W . Then the func- 
tional A is finite-valued on , and has the following properties: 

(i) A is strongly continuous: For each Fo G there exists B < oo, such that for all 
F e satisfying \\F\\ Wo < 1, 

\A(F + F) - A(F )| < B\\F\\ Wo ; 
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(ii) A: L^o is smooth: For each F,F G L^°, i/te function A(F + aF) is 

analytic as a function of a. Moreover, we have the second-order Taylor expansion, 

A{F + aF) = A(F ) + avr s (F) + \a 2 it g {Q g {Z g F)) + 0(a 3 ), a G M, 

where g = fo ■= e^^ F °^ , anc/ 7r 9 zs i/te invariant probability measure of P g . 

Proof. Part (i) follows from Proposition 2.10 combined with Lemma 3.3. 

To establish (ii) we note that A n (i*o + aF) is an analytic function of a for each initial x, 
and Fq,F G Proposition 3.6 states that this converges to A(Fo + aF), which is convex 

and hence also continuous on R, and the convergence is uniform for a in compact subsets of 
R. This implies that the limit is an analytic function of a. 

The second-order Taylor series expansion follows as in the proof of property P4 in the 
Appendix of [32]. □ 

We now consider H, viewed as a nonlinear operator from to L^. Proposition 4.4 
establishes smoothness and pointwise convexity of TC, and Proposition 4.5 gives analogous 
results for Q. See [6, Chapter 3] for related results for finite-dimensional positive matrices, and 
various applications to optimization. 

Proposition 4.4 Suppose that (DV3+) holds with an unbounded function W. 

(i) TC: — > L^, is pointwise convex: For any F\,F 2 6 , and for any 9 £ (0,1) 
we have, 

H(9F l + (1 - 9)F 2 ) < 9H(F 1 ) + (1 - 9)H(F 2 ) , 

where inequalities between functions are interpreted pointwise. 

(ii) TC is smooth: We have the second-order Taylor expansions, for any F,F$ 6 L^°, 

H(F + aF) = H(F ) + aA g F + \a 2 Q g (F) + <3(a 3 ), a G R, 

where g = fo '■= e^( F °) and A g is the generator of P g . 

Proof. We first show that TC: L™° -> L^. To see this, take any F £ L^°. Since WeL^ 
and Wo £ -L^ satisfies (6), we can find b(F) < oo such that |F| < V + b(F). It follows from 
(DV3) that, 

log(Pe F ) > log(Pe~ y ) - b(F) > —(V — 5W + b + b(F)) 
log(Pe F ) < log(Pe y ) + b(F) < V - 5W + b + b(F), 

which shows that TC(F) G L^. Given these bounds, the smoothness result (ii) is a consequence 
of elementary calculus. 

To establish convexity, we let Hi = TC(Fi) and /« = e F% , so that Pfi = e Hi fi, i = 1, 2. An 
application of Holder's inequality gives the bound, 

P(f?ft e) ) < (Ph) 6 (Ph) {1 - d) exp(^i + (1 - 9)H 2 )ffft e) - 
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With F := 0F 1 + (1 - 9)F 2 = log(/f ff~ e) ) we then have 

H(F) = log(P///) < 9H(F 1 ) + (1 - 0)W(F 2 ) . □ 

We can also obtain a Taylor-series approximation for G, but it is convenient to consider a 
re-normalization to avoid additive constants. Define, 

g (F)=g(F)-n(g(F)), Fee 

Proposition 4.5 Suppose that (DV3+) holds with an unbounded function W. For each F$ 6 
L^°, < rj < 1, there is eo > 0, 6o < 00 > such that 

\\e^ F °+V-e^\\ Vv <b \\F\\ w , 

whenever \\F\\w < eo- We have the Taylor series expansion, 

g (F + aF) = g (F ) + aZ h F + \a 2 Z /q Q /q (Z /q F) + 0(a 3 ), a£l, 

where Zj q is the fundamental kernel for Pj- o , normalized so that ttZj q F = 0, F G 

PROOF. The strong continuity follows from strong continuity of P g given in Lemma 3.3. 

The Taylor-series expansion is established first with Fq = 0. Given F € L^°, a £ I, we 
let f a = exp(aF), and let f a be the solution to the eigenfunction equation given by 

f a =[I\ a -P fa + l®7r]- 1 l. 

Under assumption (DV3) alone we have seen in Theorem 3.4 that this is an eigenfunction 
in for small \a\. We also have F a = log(/ a ) = go(F a ) + k(a), with k{a) = ir(F a ). In 
the analysis that follow, our consideration will focus on F a rather than ^o(Pi) since constant 
terms will be eliminated through our normalization. 

We note that the first derivative may be written explicitly as, 

±g a = [IX a - P fa + 1 ® K]-\j- a \J - I F Pf a )[IXa ~ Pf a + 1 ® vr]- 1 ! . 

Observe that the derivative is in since both IpPf a and [/ — Pf a + 1 (8> vr] -1 are bounded 
linear operators on L^. Similar conclusions hold for all higher-order derivatives. 
We define the twisted kernel as above, 

P a[l ,Ay.= P lM A) = S« P{ :ff M x eX,A,B. 

1 Pfa(x) 

As in [32] we may verify that the function F a = -^F a is a solution to Poisson's equation, 

P a F a = F a -F + it a (F), ir a (F) = ^A(aF) , 
where ir a is invariant for P a . Setting a = gives the first term in the Taylor series expansion 

for g . 
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To obtain an expression for the second term we differentiate Poisson's equation: 

_ F+ ± k(aF)) _ _ + pjm (88) 

We wish to compute the second derivative, = ^-log(/ a ), which requires a formula for 
the derivative of P a : For any G G L^,, 

P(f' a G)Pf a -P(f a G)(Pf' a ) 



(P.G) 



daKC1 ' ^ (^/a) 2 (59) 

= P a (F a G) - (P a G) (P a F a ) = ((F a , G)) fa . 
Letting H a = ((F a , F a )) j a , the identities (58) and (59) then give, 

P a FP=F^-H a + A"(aF). (60) 
Letting Z a denote the fundamental kernel for P a we conclude that 

PP - vr(Fi 2 )) = Z a H a = Z a ((F a ,F a )) fa . 
Evaluating all derivatives at the origin provides the quadratic approximation for Go, 

Go(aF) = aZF + \a 2 Z[((F, F))] + 0(a 3 ) 

where Z is the fundamental kernel for P, normalized so that irZ = 0, and F = ZF. 

To establish the Taylor-series expansion at arbitrary ao£twe repeat the above arguments, 
applied to the Markov chain with transition kernel P ao . This satisfies (DV3+) with V = 
c + V — F aQ for sufficiently large c > 0, by Proposition 2.11. □ 

4.2 Representations of the Univariate Convex Dual 

The following result provides bounds on the (univariate) convex dual functional A*, and gives 
some alternative representations: 

Proposition 4.6 Suppose that (DV3+) holds with an unbounded function W. Then, for any 
probability measure fi £ MY -' 

(i) A*(/x) = sup{(n,F) - A(F) -.FeL^andFe L^}. 

(ii) A*( M ) =sap{(n,-H(H)) :H e^}. 

(iii) There exists e > 0, independent of fi £ Ai^ , such that 

VI + \\fj, — ir\\w ' 

(iv) If fi is not absolutely continuous with respect to ir, then A*(/z) = oo. 
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The proof is provided after the following bound. 

Lemma 4.7 Suppose that (DV3+) holds with an unbounded function W . Then, F G 
provided the following conditions hold: F G L^; A(F) = 0; and F = FI Cv ^ for some r > 1. 

Proof. From the local martingale property we have, 

F(x) = logE :c [exp(ES (r) " 1 J P($(z)))/($(r Cv(r) )/ 
= F(x)+logE x [e W {f(Hrc v (r)))}. 

This then gives the bound, H-FHoc < Halloo + 1 1 F^c v (r) I loo < oo. □ 

Proof of Proposition 4.6. For any F G L^°, and any r > 1 we write, F r = l Cy ^[F - j r ], 
where j r G M is chosen so that A(F r ) = 0. Its existence follows from Proposition 4.3. 

From Proposition C.5 we can show that j r — > A(i ? ), and then also that A(i ? r ) -> as 
r — > oo. Consequently, A*(/i) = sup{(n, F r ) — A(F r ) : F G r > 1}, and Lemma 4.7 

implies that F r G Loo for each r, which completes the proof of (i). 

Part (ii) is essentially a reinterpretation of (i): From the equation H{F) = — F + A{F) and 
part (i) we obtain the upper bound, 

A*(fi) = sup{(/i,F)-A(F) :FeL M } 

< sup{(/i,-W(F)> :F£U 

< sup{(/i,-W(G9> iGeLj. 

Conversely, for any function G G L^, the function F := —7i{G) satisfies A(F) = 0, F G Loo- 
This gives the desired lower bound, A*(/i) > (fi,F) = (fi, —7i(G)), for G G L^. 

Result (iii) is obtained from the mean value theorem, justified by Proposition 4.3: For 
any F G L^°, e > 0, there is < e < e such that A(eF) = evr(F) + \e 2 A"{eF). Let B = 
sup{A"(e(jr) : ||G||i4/ < 1, < e < 1}. Note that Bq < oo by the Lemma following the proof. 
Then, whenever ||-F||v7 — 1> £ — 1> we have A(eF) < eir(F) + ^Bqc 2 . The definition of the 
convex dual then gives, 

efi(F) = (fi,eF) < A*(^)+A(eF) 

< A*(^)+e7r(F) + iS e 2 , 

and since this holds for any ||-F||vy < 1, we have the absolute bound, 

HF)-tt(F)\ < ±A*^) + lB e 2 , \\F\\ Wo <l. 

Letting e = y/A*(/j,) we obtain 



\\n-ir\\ Wo = sup | m (F)-tt(F)|< VA%^) + ^oA*(/x), |A»|<1, 
l!^IK<i 

which implies the desired lower bound on A* . 
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To prove (iv), write /j, = pfio + (1 — p)ni where Ho,Hi are probability measures on (X, B) 
such that hi -< ir is absolutely continuous and fj,Q is singular with respect to w. Let S denote 
the support of /Uo- We have A(F) = whenever F G is supported on S, and hence 

A* fa) > sup{(^F)-A(F):F€L^,F = I s F} 
= ps a p{((M>,F):F€L%>, F = I S F}, 

which is infinite, as claimed. □ 
Lemma 4.8 B = sup{^A(aG) : \\G\\ Wo < 1, < a < 1} < oo. 

Proof, (sketch) Let P a = P 9a and let 7r a denote the invariant distribution for given ||G||w < 
1, and a £ [0,1]. We let Z a the fundamental kernel for P a , normalized so that ir a (Z a G) = 
Tr a (G), and we let G a = Z a G. Proposition 4.3 then gives the representation, 

A"(aG) = ir a (Q a (Z a G)) = ir a (P a (G 2 a ) - (P a G a ) 2 ). 

The proof is completed on showing that 

sup ||7r ||„ < oo, sup ||G ||v < oo, 

where the supremum is over all a and G in this class. This follows from the arguments above 
- see in particular (45) and the surrounding arguments. □ 

In the following proposition we give another characterization of dual pairs (fJ.,G) for A*. 

Proposition 4.9 Suppose that (DV3+) holds with an unbounded function W. We then have: 

(i) For any H £ , ir(H(H)} > 0, with equality if and only if 71(H) = 0, in which 
case H is constant a.e. [ir]. 

(ii) // h G AiY° i s n °t invariant under P then there is H G -Loo satisfying n(Tt(H)^ < 
0. 

(hi) Suppose that [i G , and that there exists G G satisfying, 

A*(/z) = <m,-W(G)> = sup{<M,-W(^)) -.HGL^}- 
Then fj, is invariant under the twisted kernel P g . 
PROOF. The first result is simply Jensen's inequality: 

ir(H(H)) = J \og(E x [exp(H(<l>(l)) - H(<i>(0)))]y(dx) 

> E 7T [Hm))-H(H0))}=0. 
If equality holds, it then follows that e H is constant a.e. [it]. 
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To prove (ii) let F(x) = efl^ — 7 £ Ib], with e > 0, A, B £ B + small sets such that 
sup^ uB V(x) is finite, and 7 e > is chosen so that A(F) = 0. The function H := F = G{F) is 
then bounded, by Lemma 4.7. Moreover, 

H{H(H)) = = -€\m(A) - w{B)\- 

Under (DV3+) we may apply Proposition 4.3 to justify the Taylor series expansion, 

= A(F) = e[ir{A) - ^(B)} + 0(e 2 ), 

which gives % = tt(A)/tt(B) + 0(e). Choosing A, B so that n(A)/n(B) > tt(A)/tt(B) we see 
that this function H satisfies the desired bound for e > sufficiently small. 

We now prove (iii). Applying Proposition 4.6 (ii), the convex dual A* for the kernel P g 
may be expressed as 




For any H € set H' = H + G so that, 

A*(aO = -K, eC o(^iog(^^))) 

Thus n is invariant for P g , by (i). □ 

4.3 Characterization of the Bivariate Convex Dual 

We now turn to the case of bivariate functions and measures. 

Given any function of two variables M : X x X —> R, we let m = e M and extend the 
definition of the scaled kernel in (20) via, 

P m (x,dy) :=m(x,y)P(x,dy) , x,y£X. 

The following result shows that the spectral radius of this kernel coincides with that defined 
for the bivariate chain The proof is routine. 

Proposition 4.10 Suppose that P m has finite spectral radius X m in v v -norm for all sufficiently 
small r] > 0. Let Pi denote the transition kernel for the bivariate chain 

(i) I m P2 has the same spectral radius in v^-norm for sufficiently small r] > 0, with 
v V 2(x,y) = exp(r][V(y) + \SW{x)\). 

(ii) // P m has an eigenfunction f , then I m P2 also possesses an eigenfunction given by, 

/ 2 (xi,x 2 ) = m(x 1 ,x 2 )f(x 2 ). 



38 



For a Markov process with transition kernel P satisfying (DV3+), we say that M and M 
are similar if there exists H G such that 

M(x, y) = M(x, y) + H{x) - H(y) a.e. (x, y) G X x X [tt P] . 

The function M is called degenerate if it is similar Jx> M = 0. The log-generalized principal 
eigenvalues agree (A(M) = A(M)) whenever M,M are similar. This is the basis of the 
following two lemmas. 

Lemma 4.11 Suppose that (DV3+) holds with an unbounded function W. IfT£ * s a 

probability measure with A*(T) < oo, then T ~< tt P, and the one- dimensional marginals of 
r agree. 

PROOF. The conclusion that r -< tt P follows from Proposition 4.6 (iv). 

For any M G Lqo(X x X), H G I^, we have A(M) = A(M), where M(x,y) := M(x,y) + 
- Hence, for all such M, iJ, 

a* (r) > (r, m) - A(M) = (r, m) - a(m) + (r 1; - (r 2 , # ) 

where Ti and r 2 denote the two marginals. If Ti / r 2 it is obvious that the right hand side 
cannot be bounded in H. □ 



Lemma 4.12 Suppose that (DV3+) holds with an unbounded function W . Suppose moreover 
that M G L^ o2 j an d that the asymptotic variance of the partial sums 2~jM($(fc),$(fc + 
n> 1, is equal to zero. Then the function M is degenerate. 

Proof. Applying [32, Proposition 2.4] to the bivariate chain * with transition kernel P 2 , we 
can find M such that 

A?($(fc), $(fc + 1)) - M($(fc - 1), $(fc)) = -M($(fc - 1), $(fc)) + tt 2 (M) a.a. [P w ], fc > 1, 

where 7r 2 = tt P is the invariant probability measure for P 2 . Since <&(k + 1) is conditionally 
independent of <l>(fc — 1) given $(fc), it follows that M does not depend on its first variable. 
Thus we can find F G satisfying 

F($(fc + 1)) - P($(fc)) = -M($(fc - 1), $(fc)) + tt 2 (M) o.a. [P ff ], fc > 1 , 

therefore, M is similar to the constant function 7r 2 (M): 

M(x,y) = vr 2 (M) + G(z) - G(y), a.e. tt P , 

with = PF(x). □ 
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Theorem 4.13 (Identification of Dual Pairs) Suppose that (DV3+) holds with an unbounded 
function W . 

(i) Assume that M G L^° 2 and T G are given, such that A*(T) < oo and (M, T) 
is a dual pair, i.e., (T,M) = A(M) + A*(r). Define Mq as the Radon- Nikodym 
derivative, 

M (x,y) =1 °g( d [ft QP ] ( x >v)) x,y£X, 

where n is a marginal ofT (see Lemma J^-ll). Then, the junction M is similar to 
M-A(M), 

M (x, y) = M(x, y) - A(M) - F(x) + F(y) , 
where F = log(/), with f equal to an eigenfunction for P m , with eigenvalue X(M). 

(ii) Conversely, suppose that F £ M^2 

is given, satisfying T ~< [tt P], and suppose 
that its one- dimensional marginals agree. Consider the decomposition, T(dx, dy) = 
[tt P](dx , dy) , where n :=Ti = T2 is the (common) first marginal ofT on (X,B), 
and P is a transition kernel. Let 

M(x, y) = ^( ..Qpi (x, yfj x,y£X. 

If M G L^°2, then A*(T) is finite and (T,M) is a dual pair. 

Proof. Part (i) is a bivariate version of Proposition 4.9: We know that T is an invariant 
measure for a bivariate process, whose one-dimensional transition kernel is of the form, 

P m (x,dy) = e M ^- A ^-^ +p ^P(x,dy). 

Invariance may be expressed as follows: 

T(dy, dz) = / T(dx, dy)P m (y, dz) , y, z G X. 

Since T has equal marginals, denoted tt, this identity may be expressed, 

n(dy)P(y, dz) = 7c(dy)P m (y, dz) , y,z G X, 

which is the desired identity in (i). 

To prove (ii), let A( • ) denote the functional denning the log-generalized principal eigenvalue 
for the transition kernel P = P m . Proposition 2.11 gives, A(iV) = A(A^ + M) — A(M), for any 
N G L^° 2 . We can then write, 



A*(r) = su PjVGioo ((r,iv)-A(iv)) 

= su PjVeLoo ((r,iV + M)-A(iV + M)) 



su PjveLc 



((r,AT) + (r,M) -A(AT)-A(M)) 



= A*(r) + (r, M) - A(M) . 

We have A*(r) = by Proposition 4.9, and consequently (r, M) = A(M) + A* (r). This shows 
that (M, T) is a dual pair. □ 
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5 Large Deviations Asymptotics 



In this section we use the multiplicative mean ergodic theorems of Section 3 and the structural 
results of Section 4 to study the large deviations properties of the empirical measures {L n } 
induced by the Markov chain $ on (X, B); recall the definition of {L n } in (14). 

As in the previous section, we also assume throughout this section that the Markov chain 
3> satisfies (DV3+) with an unbounded function W, and we choose and fix a function Wo : 
X — ► [l,oo) in as in (34). Our first result, the large deviations principle (LDP) for the 
sequence of measures {L n }, will be established in a topology finer (and hence stronger) than 
either the topology of weak convergence, or the r-topology. As described in the Introduction, 
we consider the t w ° -topology on the space M\ of probability measures on (X, B), defined by 
the system of neighborhoods (16). 

Since the map (xi, . . . ,x n ) i— > ^Y17=i^^i f rom X™ to Mi may not be measurable with 
respect to the natural Borel a-field induced by the t Wo -topology on Mi, we will instead 
consider the (smaller) <r-field T , defined as the smallest u-field that makes all the maps below 
measurable: 

v^j'Fdv, for real- valued F G L^°. (61) 

Theorem 5.1 (LDP for Empirical Measures) Suppose that 3> satisfies (DV3+) with an un- 
bounded function W . Then, for any initial condition <J>(0) = x, the sequence of empirical 
measures {L n } satisfies the LDP in the space {Mi,^) equipped with the t Wo -topology, with 
the good, convex rate function 

I(v) :=miPH(vQ P\\vQ P) (62) 

where the infimum is over all transition kernels P for which v is an invariant measure, and 
vQP denotes the bivariate measure \v(dP]{dx,dy):=i>{dx)P{x,dy) on (XxX,Bx6): Writing 
fJ-n,x for the law of the empirical measure L n under the initial condition $(0) = x, then for 
any E G J 7 , 

- inf I(v) < lim inf ±-\ogn nx (E) 

< lim sup \ log fi n ,x( E ) < - inf - J (^) » 

n— >oo v&E 

where E° and E denote the interior and the closure of E in the t Wo topology, respectively. 

The proof is based on an application of the Dawson-Gaxtner projective limit theorem along 
the same lines as the proof of Theorem 6.2.10 in [12]. The main two technical ingredients are 
provided by, first, the multiplicative mean ergodic theorem Theorem 3.1 (iii) which, as noted 
in (36), shows that the log-moment generating functions converge to A. And second, by the 
regularity properties of A and the identification of A* in terms of relative entropy, established 
in Section 4 and Section C of the Appendix. 

As in Section 4, in order to identify the rate function for the LDP we find it easier to 
consider the bivariate chain Recall the bivariate extensions of our earlier definitions from 
equations (51), (52), (53) and (54). 
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Proof of Theorem 5.1. We begin by establishing an LDP for <& with rate function given 
by A*. Recall that Proposition 3.6 gives 

A n (F) :=IlogE x [exp(n(L n ,F>)] A(F) , n ^ oo. (63) 

In order to apply the projective limit theorem we need to extend the domain of the convex 
dual functional A* as follows. For probability measures v G 

MY , A*(i/) is defined in (50), 
and the same definition applies when v is a probability measure not necessarily in 
More generally, let L' denote the algebraic dual of the space L = L^°, consisting of all linear 
functional 6 : L — ► R, and equipped with the weakest topology that makes the functional 

6 i ^ 0(F) = (0,F) :L'^R 

continuous, for each in F G L^°. Note that each probability measure v on (X, B) induces a 
linear functional Q u : L — > M via 



(@ U ,F) = (is, F) = J Fdv. 



Therefore, we can identify the space of probability measures M.\ with the corresponding subset 
of L', and observe that the induced topology on M.\ is simply the r^-topology. 
Next, extend the definition of A* to all 6 L' via 

A*(0) = sup{(0,F)-A(F) : F € L%>} , (64) 

and observe that [12, Assumption 4.6.8] is satisfied by construction (with W = L = , X = 
V and B = J 7 ), and that by Proposition 4.3 the function A(Fq + aF) is Gateaux differentiable. 
Therefore, we can apply the Dawson-Gartner projective limit theorem [12, Corollary 4.6.11 (a)] 
to obtain that the sequence of empirical measures {L n } satisfy the LDP in the space V with 
respect to the convex, good rate function A*. Moreover, since by Proposition C.9 we know 
that A*(0) = oo for .Mi, we obtain the same LDP in the space (.Mi, J 7 ), with respect to 
the induced topology, namely, the t w ° -topology; see, e.g., [12, Lemma 4.1.5]. 

Next note that, in view of Proposition 4.1, the bivariate chain * also satisfies the same 
LDP. But in this case, we claim that can express A*(r) for any bivariate probability measure 
T as follows: 



A*(r) 



iJ(r||ri P) , if the two marginals Ti and T2 of T agree; 
00 , otherwise. 



To see this, first consider the case when Ti 7^ T2] then Theorem 4.2 (ii) and Proposition C.10 
imply that A*(r) = 00. Suppose now that Ti = T 2 . Then Proposition C.10 shows that A*(r) 
must equal i?(r||ri ©P) whenever A*(r) = 00. And if the marginals agree and A*(r) is finite, 
then the identification follows form Theorem 4.2 (iii). 

Finally, an application of the contraction principle [12, Theorem 4.2.1] implies that the 
univariate convex dual A*(u) coincides with I{v) in (62). Simply note that the t Wo -topology 
on the space of probability measures is Hausdorff, and that the map r 1— ► Ti is continuous in 
that topology. □ 
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Theorem 5.1 strengthens the "local" large deviations of [32] to a full LDP. The assumptions 
under which this LDP is proved are more restrictive that those in [32], but apparently they 
cannot be significantly relaxed. In particular, the density assumption of (DV3+) (ii) cannot be 
removed, as illustrated by the counter-example given in [18]. This example is of an irreducible, 
aperiodic Markov chain with state space X = [0, 1], satisfying Doeblin's condition. It can be 
easily seen that this Markov chain satisfies condition (DV3) with Lyapunov function V(x) = 
— \ logx, x G [0, 1], and with W given by 



W{x) :-- 



'2 " log ~ } for x G [0, 1/2); 

^2-log(2Vx) for x G [1/2,1]. 



Taking 5 = 1, C = [0, 1] and 6 = 2 yields a solution to (DV3), with the Lyapunov function V 
and the unbounded function W as above. But for this Markov chain the density assumption 
in (DV3+) (ii) is not satisfied, and as shown in [18], it satisfies the LDP with a rate function 
different from the one in Theorem 5.1. 

The LDP of Theorem 5.1 can easily be extended to the sequence of empirical measures of 
fc-tuples L n ^, defined for each k > 2 by 



71-1 

! 

L n ,k 



^ n— 1 

:= ~ Yl 5 m)Mt+i),-Mt+k-i)) , n > 1 . (65) 



n 
t=o 



We write Mi,k for the space of all probability measures on (X k ,B k ), and we let denote the 
cr-field of subsets of M.\^ defined analogously to T in (61), with X k in place of X, and with 
real-valued functions F in the space 

L oo k ■— i b ■ A ~ *■ ^ • \\* Wwo ■— su p l 7777 — r~i , m / — 7 ) < 00 r 

instead of L^°. Similarly, the r^°-topology on M\ y k is defined by the system of neighborhoods 
jV£( c , S) := {v G Mijk ■ \v(F) - c\ < 6} , for real-valued F G L^ k , c G M, 5 > . 

A straightforward generalization of the argument in the above proof yields the following 
corollary. The proof is omitted. 

Corollary 5.2 Under the assumptions of Theorem 5.1, for any initial condition $(0) = x, 
the sequence of empirical measures {L n ^} satisfies the LDP in the space (Mi t k,^k) equipped 
with the -topology, with the good, convex rate function 



Hi^klWk-i © P)> tf u is shift-invariant 
oo, otherwise. 



h{vk) = 

where Vk-i denotes the first (k — 1)- dimensional marginal of 
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Next we show that under the assumptions of Theorem 5.1 it is possible to obtain exact 
large deviations results for the partial sums S n , 



n-l 



S n :=J2 F (*(t)) = (Ln,F), n>l, (66) 
t=o 

of a real- valued functional F G L^°. In the next two theorems we prove analogs of the 
corresponding expansions of Bahadur and Ranga Rao for the partial sums of independent 
random variables [1]. Our results generalize those obtained by Miller [36] for finite state 
Markov chains, and those in [32] proved for geometrically ergodic Markov processes but only 
in a neighborhood of the mean; see [32] for further bibliographical references. 

First we note that, since for any F 6 the map v ^ (y, F) from Mi to R, is continuous 
under the t w ° topology, we can apply the contraction principle to obtain an LDP for the 
partial sums {S n } in (66): Their laws satisfy the LDP on R with respect to the good, convex 
rate function J(c) as in (19), 

J(c) = inf{/(z/) : v is a probability measure on (X,B) satisfying v(F) > c} 

= inf {#(r||ri P) : r € Mi,2 with marginals Ti = T 2 such that Ti(F) > c}. 

Alternatively, based on (the weak version of) the multiplicative mean ergodic theorem in 
(63), we can apply the Gartner-Ellis theorem [12, Theorem 2.3.6] to conclude that the laws of 
the partials sums {S n } satisfy the LDP on R with respect to the good rate function J*(c), 

J* (c) := sup [ac - A(aF)] , c G R , (67) 

aeR 

so that, in particular, J(c) = J*(c) for all c. 

Now suppose for simplicity that the function F has zero mean ir(F) = and nontrivial 
central limit theorem variance cr 2 (F) > 0; recall the definition of <J 2 (F) from Section 3.1. To 
evaluate the supremum in (67), we recall from Lemma 2.10 that A(aF) is convex in a G R, 
and since by Theorem 3.1 it is also analytic, it is strictly convex. Therefore, if we define 

d (l 
F max := lim —A(aF) = sup —A(aF), 

a^oo da a£ ]R da 

then J*(c) = oo for values of c larger than F max , and the probabilities of the large deviations 
events {S n > nc} decay to zero super-exponentially fast. 

Therefore, from now on we concentrate on the interesting range of values < c < -F ma x- 
Note that, although in the case of independent and identically distributed random variables 
it is easy to identify Fmax as the right endpoint of the support of F, for Markov chains this 
need not be the illustrated by the following example. 

Example. Let $ = {<E>(n) : n > 0} be a discrete-time version of the Ornstein-Uhlenbeck 
process in R 2 , with $(0) = x G R 2 and, 



v ' \ $ 2 (n+l) J [-a 2 -oij V ' V 





N(n + l) 
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where {N(k)} is a sequence of independent and identically distributed N(0, 1) random vari- 
ables. Let A denote the above 2-by-2 matrix, and assume that the roots of the quadratic 
equation z 2 + a\z + 02 = lie within the open unit disk in C. 

Note that there exists 7 < 1 and a positive definite matrix P satisfying, A T PA < 7/. One 
may take P = r y~ k (A k ) T A k , where 7 < 1 is chosen so that the sum is convergent. 

Then $ satisfies (DV3+) (i) with Lyapunov function V(x) = 1 + ex T Px, and W = V, 
for suitably small e > (hence, the drift condition (DV4) also holds). Condition (DV3+) (ii) 
holds with To = 2 since P 2 (x, ■ ) has a Gaussian distribution with full-rank covariance. 

Consider the functions 

F+(x) =I{\ Xl \<i}, F (x) = x 2 -xi, F(x) = Fi(x)+F (x), x = (27, x 2 ) T G M 2 . 
The asymptotic variance of To is zero, and for any initial condition we have 

X>(d>(i)) = + [*2(n - 1) - xi] • 

t=0 t=0 

We conclude that T max = (T + ) max = 1, although ir{F > c} > for each c > under the 
invariant distribution ir. 

Recall form Section 3.1 the definitions of lattice and non-lattice functionals. 

Theorem 5.3 (Exact Large Deviations for Non-Lattice Functionals) Suppose that satisfies 
(DV3+) with an unbounded function W, and that F G is a real-valued, strongly-non-lattice 
functional, with 7r(T) = and o- 2 {F) / 0. Then, for any < c < T max and all x G X, 

Px{S n >nc} ~ M^ e -nJ(c) ; n ^ QO) 

ay 2irna£ 

where a > is the unique solution of the equation ^A(aT) = c, a 2 := -pjA(aT) > 0, f a (x) 
is the eigenfunction constructed in Theorem 3.1, and J(c) is defined in (19). A corresponding 
result holds for the lower tail. 

The proof of Theorem 5.3 is identical to that of the corresponding result in [32], based on 
the following simple properties of a Markov chain satisfying (DV3+). We omit properties P5 
and P6 since they are not needed here. 



Properties. Suppose $ satisfies (DV3+) with an unbounded function W, and choose and 
fix an arbitrary x G X and a function F G with zero asymptotic mean n(F) = and 
nontrivial asymptotic variance a 2 = <r 2 {F) / 0. Let S n denote the partial sums in (66) and 
write m n (a) for the moment generating functions 

m n (a) := E x [exp(aS n )] = E x [exp(a(L ra , F))] , n > 1, a G C. (68) 

The proofs of the following properties are exactly as those of the corresponding results in [32], 
and are based primarily on the multiplicative mean ergodic theorem Theorem 3.1, and the 
Taylor expansion of A(F) given in Proposition 4.3. Observe that by Theorem 2.2 we have that 
the Lyapunov function V in (DV3+) satisfies k{V 2 ) < 00. 
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PI. For any m > there is a > m, to > and a sequence {e n } such that 

m n (a) = exp(nA(aF))[f a (x) + \a\e n ] , n > 1 , 

and |e n | — > exponentially fast as n — > oo, uniformly over all a 6 fi(a,aJ), with fi(a,uJ) 
as in Theorem 3.1. 

P2. If F is strongly non-lattice, then for any m > and any < wo < < °o, there is 
a > m and a sequence {e^} such that 

mn(®) = exp(nA(aF))e' n , n > 1 , 

and |ejj| — > exponentially fast as n — > oo, uniformly over all a = a + iuj with |a| < a 
and o;o < |cj| < wi. 

P3. If F is lattice (or almost lattice) with span h > 0, then for any e > 0, as n — > oo, 

sup |m n (iu;)| — > exponentially fast. 

e<|w|<27r/h-e 

P4. For any m > there exist a > m and ZU > such that the function A(aF) is analytic 
in a G fi(a,aJ), and for a = a G R we have A(aF)| a= o = ^A'(a/)| a= o = 0, and 
^■A"(aF)| a=0 = a 2 > 0. Moreover, a\ := -£^A(aF) is strictly positive for real a G 
[-o,a]. 

P7. For each m > there exist a > m and uJ > such that the eigenfunction / Q is analytic 
in a £ fi(a,aJ), it satisfies / Q | = 1, and it is strictly positive for real a. Moreover, 
there is some cJo £ (0, uJ) such that 

5(iuj) := | log fiu(x) — iuF{x)\ < (Const)o; 2 , 

for all \u\ < lJq, where F is as in Theorem 1.1. 

An analogous asymptotic expansion for lattice functionals is given in the next theorem; 
again, its proof is omitted as it is identical to that of the corresponding result in [32]. 

Theorem 5.4 (Exact Large Deviations for Lattice Functionals) Suppose $ satisfies (DV3+) 
with an unbounded function W , and that F £ is a real-valued, lattice functional with span 
h > 0, tt(F) = and cr 2 (F) / 0. Let {c n } be a sequence of real numbers in (e, oo) for some 
e > 0, and assume (without loss of generality) that, for each n, c n is in the support of S n . 
Then, for all x £ X, 

P x {S n >nc n } k , e -nJ n (c n ) n^zc ( 69 ) 

where A n (a) is the log-moment generating function of S n , 



A n (a) := logE 2 



e aSn 



n > 1, a G 



46 



each a n > is the unique solution of the equation ^A n (a) = c n , and J n (c) is the convex dual 
of 'An (a), 

J„(c) := A* (c) := sup[Ac - A n (A)] , n > 1, c G R . 



A corresponding result holds for the lower tail. 

Observe that the expansion (69) in the lattice case is slightly more general than the one in 
Theorem 5.3. If the sequence {c n } converges to some c > e as n — ► oo, then, as in [32], the a n 
also converge to some a > 0, and 

PJS n >nc n } kfa{x ] e- nJi - c \ n-oo, 

where a\ := £,A(aF). 
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Appendix 

A Drift Conditions and Multiplicative Regularity 

Lemma A.l allows us to bound the expansive term blc(x) in condition (DV3). We say that 
a set S £ B is multiplicatively- special (m. -special) if for every A 6 B + there exists rj > such 
that 



sup E x 



exp 



i]T A L TA (S)^ 



< oo . 



Lemma A.l If <f> is tp -irreducible, then every small set is m. -special. 

Proof. Let S be a small set, and fix A G B + . For a given fixed T > 0, define the stopping 
times {T n : n > 0} inductively via To = 0, and 

T n+1 =inf|t > T n + T : $(t) G s}, n>0. 

We consider the sequence of functions, 



9n(x) 



exp l i] 



[0,r A AT„) 



i s m))dt) 



n> 1, 



and we let B n = B n (r]) = sup^gx g n (x), n > 1. Since S" is small, there exists e > 0, T > 0, 
such that Px-jV^ > Ti} < 1 — e < 1 for all x G S. From the strong Markov property we then 
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have, 



= E x 

< e r,T E 



(ta < Ti 1 



eX P^ /[0.T0 d *) E *m) [^(^ J[0,r,AT„) MHt)) dt)]l(T A > T\) 



< e" T + (e" T VSi(2r ? )P{r A >T 1 })5 n < e^ T + (e*> T ^B l (2r 1 )(l - e))B n 



for all x £ S, where the last bound uses Cauchy-Schwartz. 

This gives an upper bound for x G S, and the same bound also holds for all x since 

/ \l/2 

9n( x ) < su P«eS 9n(y)- Choosing rj > so small that /3 := ( e v Bi(2r])(l — e) J < 1, we see 
from induction that {B n } is a bounded sequence, and lim supj^^ B n < (1 — p)~ 1 e v . □ 

Proof of Theorem 2.5. 

Recall that, under (DV3), the stochastic process (m(t), Tt) given in (9) is a super-martingale. 
That is, for any stopping time r, 



Ea;[m(r)] < m(0) = v(x), x £ X. 



(70) 



Fix any set A £ B + . An application of Lemma A.l implies that there exist constants 6i, 62 < 
00, and 771 > such that for any stopping time r, 



r-l 



exp^[r ? il c ($(s)) - Ma($(s))])] < exp(6 2 



s=0 



(71) 



From (70), Jensen's inequality, and Holder's inequality, for all sufficiently small rj > 0, and 
all finite 63 > 0, 



r-l 



r-l 



s=0 



exp(r ? y($(r)) + r? - 6 3 I A (*(a))]) 

s=0 

r-l 

cxp(r ? y(c|»(r)) + r?^[VF(<l>( S )) - I&Ic($(s))]) exp(r/ ^[±&I C (<1>( S )) - & 3 lU($( a ))]) 
exp(^2r/F($(r)) + 2?7^[VF($(s)) - bl c (®(s))} 

s=0 

r-l 

a [exp(277 ^[i6I c ($(s)) - 6 3 Ia($(s))]) 



< Ex 



x E T 



s=0 



r-l 



exp(2r/^[6I c ($(s)) - & 3 lU(*(s))]) 



s=0 
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Setting 63 = b\ri l 1 we obtain from this and (71), for all 77 < 771 (26) 1 , 

r-l 

E x [exp(^(*(r))+»7j][W(*(s))-6 3 lA(*(s))])] < v v (x) exp(2 V bb 2 r ] ^ 1 ), xeX. (72) 



s=0 

Setting r = ta A m for m > 1, and then letting m — > 00 completes the proof. □ 
Proof of Theorem 2.2. 

The construction of a Lyapunov function V* follows from the bounds given above, beginning 
with (72) (note however that W = 1 under (DV2)). Assume that the set A G £> + is fixed, with 
F bounded on A. We assume moreover that A is small - this is without loss of generality by 
[34, Proposition 5.2.4 (ii)]. Fix k > 0, and define, 

a a ■= min{i > : 3>(i) & A} , r := ct^ A k. 

Consideration of this stopping time in (72) gives the upper bound, for some b\ < 00, 

E x \l(a A > k) exp(r]V(&(k)) + \r]k)\ < b lVri (x)e-^ k , x G X, k > , 
and on summing both sides we obtain the pair of bounds, 

.<?A 1 

3X p(r ? y($(A;)) + fryfc)] < ( — 1 

"fc=0 

We now demonstrate that this function satisfies the desired drift condition: We have, 

TA 

PV*(x) = E x ^2exp(r]V(<S>(k)) + \r]k)\ < e~^V*{x) + b'l A (x) , 
k=i 

with b' = ( - ^1^ ) sup^g^ v v (y). This is indeed a version of (V4). □ 

Proposition A. 2 Suppose that X zs a-compact and locally compact; that P has the Feller 
property; and that there exists a sequence of compact sets {K n : n > 1} satisfying (27): For 
any compact set K C X, 

sup E x [e nTKn ] < 00 . 

x&K 

Then, there exists a solution to the inequality, 

HiV) < -\W + bl c 

such that V, W : X — > [l,oo) are continuous, their sublevel sets are precompact, C £ B is 
compact, and b < 00. 



<?a ^ 
? ( a r)<K(x):=E x [^exp(^(*(fc)) + i»7fc)] < ( - _ \_ r^ jv^x) , x G X . 
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Proof. Let {O n : n > 1} denote a sequence of open, precompact sets satisfying O n f X, 
and K n C closure of O n C O n+ i, n > 1. For each n > 1 we consider a continuous function 
s„: X — ► [0, 1] satisfying s n (x) = 1 for x £ O n , and s„(x) = for x £ 0^ +1 . We then define a 
stopping time r„ > 1 through the conditional distributions, 

P{r n >n| T n } = JJ(l-a n (*(t))), n > 1. 

«=i 

From the conditions imposed on s n we may conclude that tr„ > To n > T n > To n+1 for each 
n > 1. 

For n > 1, m > 1 we define V n ^ m : X — > R + by, 

r„-l 

x G X. 



K,m(a;) ^logEj, exp(^ (n - l)(l - s m ($(i))) 



Continuity of this function is established as follows: First, observe that under the Feller 
property we can infer that P x {T n = k} is a continuous function of x £ X for any k > 1. The 
bound r„ < Tx„, ft > 1, combined with (27) then establishes a form of uniform integrability 
sufficient to infer the desired continuity. 

Moreover, by the dominated convergence theorem we have V ntm (x) [ 0, m — > oo, for each 
x £ X. Continuity implies that this convergence is uniform on compacta. We choose {m n : n > 
1} so that V njmn (x) < 1 on O n+1 , and we define V n = V n ,m n - Letting W n = (n - l) (l - s m ), 
we obtain the bound H(V n ) < —W n + 1. Let {p n } C M+ satisfy Yl n >iPn = ^' J2Pn n = oo, 
and define, 

W:=l + Y,PnW n , V:=l + J2PnV„. 

Convexity of TL then gives, ~H(V) < V — W + 1. The functions W and V are evidently coercive 
and continuous. Hence the desired inequality is obtained with C = {x £ X : W(x) < \}. □ 



B v- Separable Kernels 

The following result is immediate from the definition (24). 

Lemma B.l Suppose that {P n : n £ Z + } is a positive semigroup, with finite spectral radius 
£ > 0. Then the inverse [Iz — P]^ 1 admits the power series representation, 



[iz-P]- i = j2z~ n - i P\ \ z \>i, 



n=0 

where the sum converges in norm. 

Lemma B.2 (i) is a simple corollary: 
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Lemma B.2 Consider a positive semigroup {P n : n G Z, + } that is tp -irreducible. Then: 

(i) The spectral radius £ in of {P 1 } satisfies £ < b for a given b < oo if and only 
if there is a b < bo, and a function v±: X — ► [l,oo) such that v\ equivalent to v, 
and Pv\ < bv\. 

(ii) The generalized principal eigenvalue A (see Section 2.4) satisfies A < b < oo if and 
only if there is a measurable function v\ : X — > (0, oo) such that, Pv\ < bv\. 

PROOF. Part (ii) is a consequence of [39, Theorem 5.1]. 

To see (i), suppose first that b > |, and set v± = b[Ib - P] _1 v = J2n=o b~ n P n v. Then 
v\ G by Lemma B.l, and v < v\ by construction. Moreover, it is easy to see that v\ 
satisfies the desired inequality. 

Conversely, if the inequality holds then for any 0<ry<l,n>l, 

[rf 1 b)- n - 1 P n v 1 < b- l ri n+1 n Vl , 

which shows that |||[Ir/ _1 6 — -P] -1 !^ < (1 — f7) _1 7/6 -1 . It follows that £ < rj b since v and v\ 
are equivalent. Since rj < 1 is arbitrary, this shows that 6 > £, and completes the proof. □ 

The following result will be used below to construct f-separable kernels. 

Lemma B.3 Suppose that P is a positive kernel, and that there is a measure (i G M.\ satis- 
fying 

P(x,A) < n(A), x G X, AeB. 

Then P 2 is v-separable. 

Proof. Consider the bivariate measure, T(dx,dy) = fx(dx)P(x,dy)v(y), for x,y G X. Under 
the assumptions of the proposition we have the upper bound, T(dx,dy) < v(y)fi(dx)fi(dy), 
and hence there exists a density r satisfying r(x,y) < v(y), x,y G X, and T = r[/j, x fi]. It 
follows that for any g G we have 

Pg{x) = j r(x,y)g(y)v~ l (y)fi(dy) , a.e. x G X [//]. 

For a given e > the function r can be approximated from below in Li(/i x fj) by the 
simple functions, 

N 

r e{x,y) = ^2aiI At (x)I Bt (y) < r (x,y), x,y G X, 
i=l 

and 

r(x,y) - r e (x,y)\fi(dx)fi(dy) < e. 
We then define 

P e (x,dy) = r e (x,y)v~ 1 (y)fi(dy) , x,y G X, 
and P e 2 := PPe- The latter kernel may be expressed P e 2 = s i ® ^ with 

Si(ar) := ajP(ar, Aj) , Vi(dy) =l Bi {y)v~ l {y)n{dy) , x,y£X. 
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< 



We have Si G and i/j G for each i. 
For any 5 G L^, x G X, we then have, 

\Pe29(x)-P 2 g(x)\ = \P[P e g-Pg](x)\ 

= J P(x,dy){J[r e (y,z) - r(y, z)] 5 r(z)^ 1 (z)^(dz)} | 

y |r e (j/,«) -r(y,z)|| 5 ((z)|f" 1 (2;)/i(dz)| 

< \\g\\v J J \r e (y,z)-r(y,z)\fjb(dy)fjb(dz)<e\\g\\ v . □ 

Lemma B.4 Suppose that (DV3) holds with W unbounded. Fix < n < 1, and consider any 
measurable function F satisfying 

F+:=ma X (F,0) G L^; 

(73) 

lim^oo \\F + I Cw(r) c\\ w < 5n. 
We then have \\lQ w ^cPff — ► 0, exponentially fast, as r — > 00. 

PROOF. For simplicity we consider only 77 = 1. Choosing ro > 1 so that ||F + I Clv ( ro ) C ||w = 
<5o < <5, we have, 

< e V-(«5-<5o)^ +b < e y-(5-5o)r+6 Qn CW(ro )c > 

and hence Pc w (r) cP /L < e- (5 ~ 5 ^ r+b for all r > 1. □ 

Lemma B.5 Suppose that (DV3+) holds with W unbounded. Fix < 77 < 1, and consider 
any measurable function F satisfying (73). Then (Pf) 2To+2 is v n -separable. 

Proof. For simplicity we present the proof only for 77 = 1. We define the truncation, 

P r := (I Cw(r) P f ) T ° +1 . 

For each r > 1 we have 

P r {x, A) < (3' r (A) := I r (dx)P f (x,A) x£X,AeB. 

JC w (r) 

It then follows from Lemma B.3 that the kernel P 2 is f-separable. 

Finally, applying Lemma B.4 we may conclude that \\{Pf) 2T ° +2 — Pr\\\ v —> 0, r — > 00, which 
implies that (Pj) 2T ° +2 is also f-separable. □ 



52 



Proof of Theorem 2.4. 

(a) =>■ (b). When (DV3) holds we can conclude from Lemma B.4 that |||P — Ic w (r)P\ v ~* 
as r — > oo. It follows that — Ic w (r)P T \ V() ~^ as r — > oo for any T > 1. In particular, this 
holds for T = To. Under the separability assumption on {I Cw ^P T ° : r > 1} it then follows 
that P T ° is f-separable. 

(b) =>• (a). We first show that each of the sets {C vo (r) : r > 1} is small. Under the assump- 
tions of (b) we may find, for each e > 0, an integer N > 1, functions {sj : 1 < i < N} C 
and probability measures {z^ : 1 < i < iV} C -M^ such that, with K = Sj ® z/j, 

lll^ T °-^IIU<e. (74) 

This gives for any r > 1, 

|1 - ^Si(x)| = (x) - Kl (x)| < ev (x) < er, x G C, (r). 

Let A 6 B be a small set with fi{A c ) < e for each i From the bound above and using 
similar arguments, 

P T °(x,A c ) < K(x,A c ) +ev (x) 

< J2i Si{x)vi{A c ) + ev (x) 

< (1 + er)e + er, x G C„ (r) . 

It follows that for any r > 1, we may find a small set A(r) such that P T ° (x, yl(r)) > |, for 
x G C„ (r). It then follows from [34, Proposition 5.2.4] that Cu (r) is small. 

We now construct a solution to the drift inequality in (DV3). Using finite approximations 
as in (74), we may construct, for each n > 1, an integer r n > n such that 

l(^ccj T iL<|P To /c=jiu< e - 2 " To . 

Since the norm is submultiplicative, this then gives the bound, 

\t(PIccJ% < boe~ 2nk , k>0, 



where b := (||P|||„ ) T °. 

We then define for each n > 1, 



k=0 



From the previous bound on \\(PIc^ ) fe ||L we have the pair of bounds, 



r„ ' "If 

1 e~ 2n 

'Jn\\v <bo- -, and ||-Pu n ||„ < b - -• (75) 

1 — e~ n 1 — e~ n 



Finally, we set 

V := log(l + E~i^) 
W := bI c -H(V), 
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where C = C v (r) for some r, and the constants b and r are chosen so that W(x) > 1 for all 
x £ X. The bounds (75) together with the lower bound v n > VQe n Ic^ imply that 

e V(x) 

lim inf exp(— H(V)) = lim inf — ... - , = oo, 

r-*oo xeC v (r) c r^oo xeC v (r) c (Pe v ) (x) 

which implies the existence of r and b satisfying these requirements. □ 

In much of the remainder of the appendix we replace (DV3+) with the following more 
general condition: 



(i) The Markov process 3> is ^-irreducible, aperiodic, and it satisfies 

condition (DV3) with some Lyapunov function V: X — ► [l,oo), and an 
unbounded function W : X —> [1, oo). (76) 

(ii) There exists Tq > such that Ic w ( r )P T ° is -u-separable for for each r < oo. 



Theorem 2.4 states that this is roughly equivalent to (DV3+) with an unbounded function 
W. In fact, we do have an analogous upper bound for P T °: 

Lemma B.6 Suppose that the conditions of (76) hold. Then, for each r > 1, e > 0, there is 
a positive measure r>e £ M.\ such that 

P T »h(x) < pr, e (h) + 4Hv , x e C w (r), heL^. 

Proof. We apply the approximation (74) used in the proof of Theorem 2.4, where {si : 1 < 
i < N} C are non-negative valued, and {i>i : 1 < % < N} C M^ are probability measures. 
We may assume that the {si} satisfy the bound 1 = P T °(x,X) > ^ Si(x) — 1, x € Cw(r), and 
it follows that we may take /3 r>e = 2 J2iLi v i- ^ 

The following result is proven exactly as Lemma B.5, using Lemma B.6. 

Lemma B.7 Suppose that the conditions of (76) hold. Fix < r] < 1, and consider any 
F G satisfying (73). Then (P/) 2T ° is v„-separable. 



C Properties of A and A* 

In this section we obtain additional properties of A and A*. One of the main goals is to 
establish approximations of A(G) through bounded functions when G is possibly unbounded. 
Similar issues are treated in [13, Chapter 5] where a tightness condition is used to provide 
related approximations. 

Lemma C.l For a ip -irreducible Markov chain: 

(i) The log-generalized principal eigenvalue A is convex on the space of measurable 
functions F: X — > (—00,00]. 



54 



(ii) The log-spectral radius E is convex on the space of measurable functions F : X — ► 
(—00, 00] . 

Proof. The proofs of (i) and (ii) are similar, and both proofs are based on Lemma B.2. We 
provide a proof of (ii) only. 

Fix F ± ,F 2 £ L™°, rj,0 € (0,1), and let k = rT^Pj), i = 1,2. Lemma B.2 implies that 
there exists functions {t>i,v 2 } equivalent to t>, and satisfying 

E x [exp(F,($(0)) + V5($(l)))] := P/^i (*) < b iVi (x), i = 1,2, x G X. 

We then define 

P e = 0Pi + (1 - 9)F 2 , V e = eV 1 + (l-0)V 2 , 
so that by Holder's inequality, 

P fe v e (x) = E^exp^P^O)) + Vi($(l))] + (1 - 0)[F 2 (*(O)) + F 2 ($(l))])] 

< E :r [exp(P 1 ($(0)) + y 1 (<l>(l)))] e E4exp(P 2 (cI>(0)) + F 2 (<l>(l)))] 1 - e 

< , x e X. 

The function is equivalent to v. Consequently, we may apply Lemma B.2 once more to 
obtain that £(Fg) < bib\~ e . Taking logarithms then gives, 

~(F ) < e\og(h) + (1 - e)\og(b 2 ) = 0s(Pi) + (1 - e)E(F 2 ) - i og fa). 

This completes the proof since < n < 1 is arbitrary. □ 
The following result establishes a form of upper semi-continuity for the functional A. 

Lemma C.2 Suppose that <& is tp -irreducible, and consider a sequence {F n } of measurable, 
real-valued functions on X. Suppose there exists a measurable function F: X — ► R smc/i 
f/iaf P n | P, as n I 00. T/ien the corresponding generalized principal eigenvalues converge: 
A(P n ) -► A(P), flsnjoo. 

Proof. It is obvious that lim sup n ^ 0O A(P n ) < A(P). To complete the proof we establish a 
bound on the limit infimum. 

Under the assumptions of the proposition we have PjT > Pj^, for any T > 1, n > 1. It 
follows that we can find an integer To > 1, a function s: X — ► [0,1], and a probability u on B 
satisfying ^(s) > and 

P?° > s®v, 1< n < 00. 

Let (/„, A n ) denote the Perron-Frobenius eigenfunction and generalized principal eigenvalue 
for Pf n , normalized so that v(h n ) = 1 for each n. For each n > 1 we have the upper bound, 
Pf n fn < Kfn- This gives a lower bound on the {/„}: 

/„ > K T °Pf:fn > K T °"(fn)s = X~ T °S. 

Let h = lim inf n ^oo f n , A = lim infj^oo A n . Then, by Fatou's Lemma, Pfh < Xh. We also 
have u(h) < 1 by Fatou's Lemma, and the lower bound h > X~ T °s. It follows from Lemma B.2 
that A(P) < log(A). □ 
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In applying Lemma C.2 we typically assume that suitable regularity conditions hold so that 
= A(F). Under a finiteness assumption alone we obtain a complementary continuity 
result for certain classes of decreasing sequences of functions. One such result is given here: 

Lemma C.3 Suppose that \\P\\ V < oo, and that F: X — > R is measurable, with < oo. 

Then, with F n : = max(F, —n) we have, E(F n ) [ as n \ oo. 

Proof. This follows immediately from the approximation, |||-P/ n — Pff v < e_ra ||-f 5 |li,; n>\. □ 

To establish a tight approximation for A(M), where M = logm is as in the proof Theo- 
rem 4.2, we will approximate M by bounded functions. 

Proposition C.4 Suppose that \\P\\ V < oo, and that F: X — > R is measurable, with < oo, 
and A(F) = Then, there exists a sequence {n k : k > 1} such that with F k := FI{— n k < 

F < k} we have: 

A(F k ) -► A(F) and E(F k ) -► H(F) as fc -► oo . 

Proof. Let F fc ° := FI{F < k}. From Lemma C.2 we have A(F fc °) | A(F), k -> oo. It follows 
that we also have E(F®) | H(-F), A; — ► oo, since E dominates A. 

We now apply Lemma C.3: For each > 1 we may find n& > 1 such that with F k := 
Fl{-n k <F<k}, 

A(F°) < A(F k )<A(F°) + k-\ 

< -(F^S^ + AT 1 , fc>l. □ 

The following proposition implies that A is tight in a strong sense under (DV3+): 

Proposition C.5 Suppose that the conditions of (76) hold. Then, for any increasing sequence 
of measurable sets K n j X, and any G € , 

(i) lim A(GIa-c) = 

(ii) lim A{Gl Kn ) = A(G) 

The proof is postponed until after the following lemma. 

Lemma C.6 Suppose that the conditions of (76) hold, and consider any increasing sequence 
of measurable sets K n j X, and any G € . Then, on letting g n = exp(i^cG), n > 1, we 
have 

|||P T »P 9n -P T » +1 |||^0, n^oo. 

Proof. We may assume without loss of generality that G > 0. As usual, we set g = e G . 

Under (76) we have |||P Sn |||„ < \\Pg\\ v < oo, n > 1. Consequently, given Lemma B.4, it is 
enough to show that for any r > 1, 

¥c w (r) [ pT ° P 9n ~ P To+1 }\\v - , n ► oo. 
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To see this, observe that for any h G L^,, x G X, 

\lc w (r)[P T °P gn -P To+1 Mx)\ = \l Cw{r) [P T H KnC [P g -P\\h{x)\ 

< I Cw{r) P T H KnC [P g -P]\h\{x) 

< \\h\\ v \\P g \\ v {I Cw{r) P T H K c n )v{x) 

< WhUlPgUPrA^v) + CV] , 

where the measure f3 re G M\ is given in Lemma B.6. Consequently, 

lim sup\jI Cw{r) [P T °P gn - P T ° +1 ]\j v < e\\P g \\ v . 

n— +00 

This proves the result since e > is arbitrary. □ 

PROOF of Proposition C.5. To see (i), consider any G G and any sequence of 

measurable sets K n f X. We assume without loss of generality that G > 0. 

Fix any b > 1, and define for n > 1, G n = (To + l)6IxcG. In view of Lemma C.6, given 
any A > 0, we may find n > 1 such that the spectral radius of the semigroup generated by 
the kernel P n := P T °P gn satisfies £ n < e A . With n, A fixed, we then have for some b n < 00, 
Pn v ^ b n e kA v iov k > 1. This has the sample path representation, 



K 

exp(£ G n ($((T + l)i - l)))u((r + !)£:))] < b n e kA v(x), 



i=i 



x€X,k>l. 



Denote by ho^(x) the expectation on the left hand side. We then have, for each j > 1, 

h j>k (x) := P J '/io )fc (x) < b n e kA (\\P\iyv(x) , x G X. 
Moreover, each of these functions has a sample path representation, 



hj,k(x) 



K 

exp(j2 Gn(Hj - 1 + (To + l)i)j)v($(j + (T + l)k) 



i=i 



a; G X, j > 1, jfe > 1. 



We then obtain the following bound using Holder's inequality, 

(T +l)(fc+l)-l 



exp( £ 6I i ,c($W)G($«))«($((To + l)(fc + 1)))" 

i=T 

< (fljio Ex[«p(eJ=i - 1 + (r + i)i)))t>(*((T + i)(* + i)))]) (To+1) " 

< ll|P|IL(nJ=o Ex [exp(Ej=i Gn(*(j - 1 + (To + l)t)))«(*0" + (^o + 1)*))^ 



(To+1)- 1 



< b n (\\Pl\ v ) To+1 e kA v(x), x€X,k>l. 



We conclude that A(J. K cG) < A/(T + 1). Since A > is arbitrary, it follows that A(J. K cG) -»■ 0, 
n — > 00. 
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To see (ii), fix 8 £ (0, 1), and obtain the following bound using convexity, 

A(0G) = A(9I Kn G + (1- 6)6(1 -0)-H K cG) 

< 6K{l Kn G) + (1 - 0)A(0(1 - eyH^G) . 

From (i) we conclude that 

A(8G) < Olim inf A(l Kn G), < 8 < 1, 

n— +oo 

which gives A(G) < lim inf^^^ A(l Kn G). To obtain the reverse inequality we argue similarly: 
A{l Kn G) < 6A{6' 1 G) + (1 - 0)A(-(1 - B)-H K .G), 

which shows that 

lim sup A(l Kn G) <6A(6- 1 G), 0<8<1. 

n— >oo 

This shows that A(lK n G) — ► A(G) as claimed. □ 
Proposition C.5 allows us to broaden the class of functions for which S is finite-valued. 

Proposition C.7 Suppose that the conditions of (76) hold. Then, there exists W\\ X — > 
[1, oo) satisfying the following: 

(i) W €L£i, md^GC- 

(ii) sup{y(x) : x G CVj(r)} < oo /or eac/t r > 1; 

(iii) E(Wi) < oo. 

// i/te state space X is a-compact, then we may assume that W\ is also coercive. 

Proof. Fix a sequence of measurable sets satisfying K n f X, with sup^g^ V(a;) < oo for 
each n. Proposition C.5 implies that we may find, for each k > 1, an integer > 1 such that 
E{2 k+1 I K c W ) < 1. We then define 



oo 

Wi=(l + X>cjwb. 



fe=i 

The functional S is convex by Lemma C.l, which gives the bound, 

oo 

H(Wi) < ±Z(2W ) + ^2- fc - 1 H(2 fc +V Sfe Wo) < ±(1 + E(2W )) < oo. 
fc=i 

To see that W\ G L^, we apply Lemma 2.9. 

Finally, if X is a-compact, then the {K n } may be taken to be compact sets, which then 
implies the coercive property for W\. □ 
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We have the following useful corollary. The proof is routine, given Proposition C.7 and 
Proposition B.2 (i); see also [2, Theorem 2.4]. 

Lemma C.8 Suppose that the conditions of (76) hold. Then, for any N < oo, there exists 
ro > 1, 6o < oo, such that with r = Tc v ( n) ), 

h x exp 

We now turn to properties of the dual functional A* defined in (64). The continuity results 
stated in Proposition C.5 lead to the following representation. 

Proposition C.9 Suppose that the conditions of (76) hold. Let be a linear functional on 
-^oo°2 satisfying A*(0) < oo. Then may be represented as, 

(e,G)=u(G), GeL^°, 
where v G is a probability measure. 

Proof. We proceed in several steps, making repeated use of the bound, 

(0, G) < A*(0) + A(G) < oo , GeLZ - (77) 
First note that on considering constant functions in (77) we obtain, 
A*(0) > sup[(0, c) - A(c)] = sup[(0, 1) - l]c. 

It is clear that finiteness of A* implies that (0, 1) = 1. Next, consider any G: X — > M + with 
G G LZ°. Then, since A(cG) < for c < 0, 

A*(0) > sup[(0,cG) - A(cG)] > sup(0,G)c. 

c c<0 

We conclude that (0, G) > for G > 0. 

Consider now a set A G B of ^-measure zero. Then A(cl^) = for any c > 0, and we can 
argue as above using (77) that oo > A*(0) > sup c>o (0, 1^4)0, which shows that (0,ILa) = 0. 

Finally, we demonstrate that defines a countably additive set function on B. Let {Ai} C 
B denote disjoint sets, and let G n = X^Sn+i ^ ■ Then < G n < 1 everywhere, and G n [ 0. 
Proposition C.5 implies that A(bG n ) —>■ 0, n —>■ 00, for any 6 € 1. Consequently, 

A*(0) > limsu Pn ^ oo [0(6G n )-A(6G n )] 
= 61im sup n ^ oo 0(G n ). 

It follows that lim sup n ^ oc @(G n ) = 0, which implies that defines a countably additive set 
function, so that is in fact a probability measure. □ 
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More generally, we define A* for bivariate probability measures T not necessarily in 
using the same definition as in (54). Recall from Lemma 4.11 that the two marginals of V 
agree whenever A*(T) < oo. Proposition C.10 provides further structure. 

Proposition C.10 For any probability measure V on (X x X, £> x B) with first and second 
marginal equal to some n, 



A*(r) < H(T\\ttQP), 



and, moreover, 



A*(r) = H(r\\TTQP) = oo forT(£MY 2 . 



(78) 



(79) 



Proof. If we view W as a function on X x X with W{x,y) = W(x), x,y G X, then we have 
the bound, for all e > 0, n > 1, 

e(r, W A n) < A(eW An) + A*{T) < A(eW) + A*(r). 

Lemma B.5 shows that A(eW) < oo for e > sufficiently small, and this gives (79). 

Define P through the decomposition r = tt P, and let E denote the expectation for the 
Markov chain with transition kernel P. We assume that P is of the form 



P(x,dy) = m(x,y)P(x,dy), x,y£X, 
and set M = log(m), since otherwise the relative entropy is infinite and there is nothing to 



prove. We then have, for any GeL 



Wo 
oo,2' 



A(G) = lim T ^ 00 ilog(E :E [e X p(T(LT,G'))]) 

= limT^oo^log^Jexp^LT^-M))]) 



> lim sup r _ 



♦ oo t x 



T(L T ,G-M) 



= (r, G - M) a.e. x £ X [tt] , 



(Jensen's inequality) 

(mean ergodic theorem for P) 



where the application of the mean ergodic theorem is justified by the /-norm ergodic theorem 
[34, Theorem 14.0.1]. The integrability conditions required in this result are obtained as 
follows. First, recall that r(|G|) < oo when A*(r) is finite and G £ L^°. Also, as in the proof 
that H(T || 7f P) > 0, one can show that T(M_) < oo, where M_ := \M A 0|. Consequently, 
(M — G)~ is T-integrable, which is what is required in the mean ergodic theorem. 
The above bound may be interpreted as, 



H(T || tt P) = (r, M) > (r, G) - A(G). 
Taking the supremum over all G G L^° 2 gives (78). 



□ 
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